摘 要:随着数字图书馆的发展,数字资源逐渐成为高校图书馆馆藏不可缺少的一部分。元数据一直是图书馆实现文献有序化的主要工具。在数字图书馆的建设中,元数据也同样起到重要的作用。传统的元数据提取方法通常采用手工录入或者复制粘贴的方法,效率低下,费时费工,错误率高。文章探讨利用八爪鱼网络爬虫技术自动采集元数据的方法,该方法可提高元数据的提取效率,并且具有较强的适应性。数字资源元数据的建设对于图书馆来说,还是一个需要不断研究、不断实践、不断发展的新兴领域。如何基于高校图书馆数字资源元数据的特点,实现元数据的自动采集是本文研究的重点。
中图分类号:TP391.1;G250.73 文献标识码:A 文章编号:2096-4706(2019)04-0004-03
Acquisition of Metadata Efficiently by Using Octopus Web Crawler Technology in University Libraries
ZHANG Zhiyong
(Guangdong Peizheng College,Guangzhou 510830,China)
Abstract:With the development of digital libraries,digital resources have gradually become an indispenSable part of the collection of university libraries. Metadata has always been the main tool for library to achieve document ordering. Metadata also plays an important role in the construction of digital libraries. Traditional methods of metadata extraction usually use manual input or copy and paste method,which is inefficient,time-consuming,labor-consuming and high error rate. This paper discusses the method of automatically collecting metadata using octopus web crawler technology. This method can improve the efficiency of metadata extraction and has strong adaptability. For libraries,the construction of digital resource metadata is still a new field that needs to be studied,practiced and developed continuously. How to realize the automatic collection of metadata based on the characteristics of digital resources metadata in university libraries is the focus of this paper.
Keywords:octopus;web crawler;metadata;university library
