基于 BeautifulSoup+requests 和 selenium 自动化处理的实现和性能对比-现代信息科技

点击排行

当前位置>主页 > 期刊在线 > 信息技术 >

信息技术21年16期

基于 BeautifulSoup+requests 和 selenium 自动化处理的实现和性能对比

李晨昊

（中国移动通信集团湖北有限公司武汉分公司，湖北武汉 430000）

摘要：网络爬虫是一种按照一定的规则，自动地抓取网页信息的程序或者脚本，因此编写特定的网络爬虫可以用来对网页进行自动化处理，从而达到提升工作效率的目的。文章针对同一个任务清单系统，分别使用 BeautifulSoup + requests 和selenium 两种不同的爬虫方法实现了网页自动化处理功能。并且通过对两种方法的实现原理和运行结果进行分析，对两种爬虫方法进行对比。

关键词：爬虫；网页自动化；BeautifulSoup+requests；selenium

DOI:10.19850/j.cnki.2096-4706.2021.16.003

中图分类号：TP391 文献标识码：A 文章编号：2096-4706（2021）16-0010-04

Implementation and Performance Comparison of Crawler Web Page Automatic Processing Based on BeautifulSoup + requests and selenium

LI Chenhao

(Wuhan Branch of China Mobile Hubei Co., Ltd., Wuhan 430000, China)

Abstract: Web crawler is a program or script that automatically grabs web page information according to certain rules. Therefore, a specific web crawler can be written to process web pages automatically, which provides efficiency improvement. The paper uses two different crawler methods: BeautifulSoup + requests and selenium to implement webpage automatic processing function for the same task list system. By analyzing the implementation principle and operation results of the two methods, the two crawler methods are compared.

Keywords: crawler; webpage automation; BeautifulSoup+requests; selenium

参考文献：

[1] 欧阳元东 . 基于 Python 的网站数据爬取与分析的技术实现策略 [J]. 电脑知识与技术，2020，16（13）：262-263.

[2] 王鑫 . 基于 Python 的微信公众平台数据爬虫 [J]. 福建质量管理，2019（17）：270-271.

[3] 高艳 . 基于 Selenium 框架的大数据岗位数据爬取与分析[J]. 工业控制计算机，2020，33（2）：109-111.

[4] 刘军 . 基于 Selenium 的网页自动化测试系统设计与实现[D]. 武汉：华中科技大学，2014.

[5] 沈承放，莫达隆 .beautifulsoup 库在网络爬虫中的使用技巧及应用 [J]. 电脑知识与技术，2019，15（28）：13-16.

作者简介：李晨昊（1990.06—），男，汉族，湖北武汉人，中级通信工程师，硕士研究生，研究方向：计算机。

上一篇：基于 Vue 的地下综合管廊管理平台的前端设计与实现

下一篇：云存储架构在创新创业大数据智慧服务平台的应用研究