摘 要:网络爬虫属于网络机器人,也被称为网页蜘蛛。随着科学技术在人们生活中的不断渗透,对计算机的依赖程度逐渐提升,搜索引擎也变得更加重要,但是以往传统的搜索引擎已经难以满足了现代化需求。对此,本文在Python 的基础上研究出一种新型的网络爬虫,它能够很好地克服传统引擎中存在的弊端,为人们提供更多、更全面的搜索内容。基于此,本文将以新新闻为例,对Python 爬虫系统的设计与实现进行分析。
关键词:Python;新浪新闻;爬虫系统
中图分类号:TP391.1;TP393.092 文献标识码:A 文章编号:2096-4706(2018)07-0111-02
Design and Implementation of Sina News Crawler System Based on Python
CHEN Meng
(School of Computer Science & Technology,Soochow University,Suzhou 215006,China)
Abstract:The network crawler belongs to the network robot,also known as the web spider. With the continuous infiltration of science and technology in human life,the dependence degree of the computer is increasing gradually,and the search engine is becoming more important. But the traditional search engine has been difficult to meet the needs of modernization. Based on Python,a new type ofeb crawler is developed. It can overcome the drawbacks in the traditional engine and provide more comprehensive search content for people. Based on this,this article will take Sina News as an example to analyze the design and implementation of Python crawler system.
Keywords:Python;Sina News;crawler system
参考文献:
[1] 赵鹏程. 分布式书籍网络爬虫系统的设计与实现 [D]. 成都:西南交通大学,2014.
[2] 吕阳. 分布式网络爬虫系统的设计与实现 [D]. 成都:电子科技大学,2013.
[3] 郝以珍. 基于页面分析的网络爬虫系统的设计与实现 [D].武汉:华中科技大学,2012.
作者简介:陈猛(1991.03-),男,汉族,江苏扬州人,硕士在读,研究方向:计算机技术。