高校图书馆利用八爪鱼网络爬虫技术高效采集元数据-现代信息科技

点击排行

当前位置>主页 > 期刊在线 > 信息技术 >

信息技术2019年4期

高校图书馆利用八爪鱼网络爬虫技术高效采集元数据

张志勇

（广东培正学院，广东广州 510830）

摘要：随着数字图书馆的发展，数字资源逐渐成为高校图书馆馆藏不可缺少的一部分。元数据一直是图书馆实现文献有序化的主要工具。在数字图书馆的建设中，元数据也同样起到重要的作用。传统的元数据提取方法通常采用手工录入或者复制粘贴的方法，效率低下，费时费工，错误率高。文章探讨利用八爪鱼网络爬虫技术自动采集元数据的方法，该方法可提高元数据的提取效率，并且具有较强的适应性。数字资源元数据的建设对于图书馆来说，还是一个需要不断研究、不断实践、不断发展的新兴领域。如何基于高校图书馆数字资源元数据的特点，实现元数据的自动采集是本文研究的重点。

关键词：八爪鱼；网络爬虫；元数据；高校图书馆

中图分类号：TP391.1；G250.73 文献标识码：A 文章编号：2096-4706（2019）04-0004-03

Acquisition of Metadata Efficiently by Using Octopus Web Crawler Technology in University Libraries
ZHANG Zhiyong
（Guangdong Peizheng College，Guangzhou 510830，China）

Abstract：With the development of digital libraries，digital resources have gradually become an indispenSable part of the collection of university libraries. Metadata has always been the main tool for library to achieve document ordering. Metadata also plays an important role in the construction of digital libraries. Traditional methods of metadata extraction usually use manual input or copy and paste method，which is inefficient，time-consuming，labor-consuming and high error rate. This paper discusses the method of automatically collecting metadata using octopus web crawler technology. This method can improve the efficiency of metadata extraction and has strong adaptability. For libraries，the construction of digital resource metadata is still a new field that needs to be studied，practiced and developed continuously. How to realize the automatic collection of metadata based on the characteristics of digital resources metadata in university libraries is the focus of this paper.

Keywords：octopus；web crawler；metadata；university library

参考文献：

[1] 蔡毅杰，骆兵. 元数据在图书馆信息管理中的应用 [J]. 科技信息，2014（5）：144.

[2] 陈乐. 基于Python 的网络爬虫技术 [J]. 电子世界，2018（16）：163+165.

[3] 刘宇，程学林. 基于决策树算法的爬虫识别技术 [J]. 软件，2017，38（7）：122-125.

[4] 毛逸恒. 基于网络爬虫的网页信息获取技术 [J]. 通讯世界，2018（6）：11-12.

[5] 百度百科. 网络爬虫 [EB/OL].https://baike.baidu.com/item/%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB，2018-12-25.

[6] 八爪鱼官网.http://www.bazhuayu.com/.

[7] 崔玉洁，廖坤. 借助八爪鱼采集器实现过刊网刊元数据的自动提取 [J]. 编辑学报，2016，28（5）：485-488.

作者简介：张志勇（1977.09-），男，汉族，广东五华人，图书馆管理员，本科，研究方向：图书馆数字资源管理。

上一篇：基于Axis2 架构+TC-ITK 实现Teamcenter 系统的Web Service 接口

下一篇：基于Hadoop 的数据处理平台的设计与开发研究