摘 要:基于广受欢迎的微博平台,利用新浪微博提供的 API 接口及网络爬虫技术从微博中提取用户数据,通过支持向量机算法(SVM)将微博用户分为水军用户和非水军用户两类。再利用改进的支持向量机算法(SVM)从大量的用户数据中提取特征值,实现多分类支持向量机模型,将用户分为正常用户、炒作型水军、营销型水军、谣言型水军四类。研究结果表明,构建的模型可以较为准确地识别出用户的类型,识别误差率较低。
关键词:新浪微博;特征提取;网络爬虫;支持向量机算法;识别误差率
DOI:10.19850/j.cnki.2096-4706.2022.16.028
基金项目:江西省教育厅科技项目 (GJJ205702)
中图分类号:TP391 文献标识码:A 文章编号:2096-4706(2022)16-0107-03
Research on MicroBlog User Recognition and Classification Based on SVM Algorithm
LI Xinhuan, HUANG Weili
(Jiangxi Engineering Vocational College of Jiangxi Open University, Nanchang 330046, China)
Abstract: Based on the popular MicroBlog platform, the user data is extracted from MicroBlog by using the API interface provided by Sina MicroBlog and Web crawler technology, and the MicroBlog users are divided into two categories of water army users and non water army users by support vector machine (SVM) algorithm. Then the improved support vector machine (SVM) algorithm is used to extract feature values from a large number of user data to realize a multi classification support vector machine model. Users are divided into four categories: normal users, hyped water army, marketing water army and rumor water army. The study results show that the constructed model can accurately identify the types of users, and the recognition error rate is low.
Keywords: Sina MicroBlog; feature extraction; Web crawler; SVM algorithm; recognition error rate
参考文献:
[1] FANG M,FANG Y.A new intelligent recognition method of zombie fan [J].Computer Engineering,2013(4):190-193.
[2] CHU Z,STEVEN G,WANG H N,et al. Detecting automation of twitter accounts:are you a human,bot,or cyborg [J]. Dependable and Secure Computing,2012,9(6):811-824.
[3] IRANI D,WEBB S,PU C. Study of Static classification of social spam profiles in MySpace [C]//International Conference on Weblogs and Social Media. Washongton :DBLP,2013:591-597.
[4] 王淑琪,王未央.基于支持向量机的微博水军账号识别 [J].现代计算机(专业版),2018(9):27-31.
[5] 程晓涛,刘彩霞,刘树新 . 基于关系图特征的微博水军发现方法 [J]. 自动化学报,2015,41(9):1533-1541.
[6] 韩忠明,许峰敏,段大高 . 面向微博的概率图水军识别模型 [J]. 计算机研究与发展,2013,50(S2):180-186.
[7] 张扬,范岩,夏玲玲,等 . 微博用户信息采集分析系统设计与实现 [J]. 软件导刊,2019,18(9):125-129.
[8] 李新焕,陈婧,王兰花,等 . 社交网络数据提取方法研究与实现 [J]. 网络安全技术与应用,2017(4):104-106.
[9] 申莹,刘春阳,赵永翼 . 基于 SVM 算法的微博评论数据情感分析 [J]. 数字通信世界,2020(1):111+117.
[10] 谢忠红,张琳,孔佳玮 . 基于内容和支撑向量基算法的微博用户识别和分类 [J]. 金陵科技学院学报,2017,33(2):9-12.
作者简介:李新焕(1989—),女,汉族,河南项城人,讲师,硕士研究生,研究方向:数据挖掘、数据分析等。