基于文本挖掘算法的网络评论分类分析——以豆瓣电影评论为例-现代信息科技

点击排行

当前位置>主页 > 期刊在线 > 信息技术 >

信息技术21年8期

基于文本挖掘算法的网络评论分类分析——以豆瓣电影评论为例

王睿

（四川大学锦城学院，四川成都 611731）

摘要：在网络技术高速发展的背景下，信息纷乱繁杂，如何能够获得需要的文本信息，成了许多企业或组织关注的问题。该项目以采集的豆瓣电影评论数据为例，使用 Python 语言和朴素贝叶斯等多种算法，对文本挖掘进行全流程的分析，包括对其特征及其子集进行提取，并对文本进行聚类和分类处理，同时采用交叉验证方法对模型进行调整，从而预测有关评论的类型，并将其作为电影推荐的一个标准。

关键词：文本分词；文本向量化；词频矩阵；朴素贝叶斯

DOI:10.19850/j.cnki.2096-4706.2021.08.005

中图分类号：TP391.1 文献标识码：A 文章编号：2096-4706（2021）08-0017-04

Classification and Analysis of Network Comments Based on Text Mining Algorithm ——Take Douban Film Review as an Example

WANG Rui

（Jincheng College of Sichuan University，Chengdu 611731，China）

Abstract：Under the background of the rapid development of network technology，information is messy and complicated，and how to obtain the required text information has become a concern for many enterprises or organizations. Taking the collected Douban film review data as an example，this project uses Python language，Naive Bayes and other algorithms to analyze the whole process of text mining，including extracting its features and subsets，clustering and classifying the text，and adjusting the model by cross validation method，so as to predict the types of relevant reviews，and take it as a standard for film recommendation.

Keywords：text segmentation；text vectorization；word frequency matrix；Naive Bayes

参考文献：

[1] 张公让，鲍超，王晓玉，等 . 基于评论数据的文本语义挖掘与情感分析 [J]. 情报科学，2021，39（5）：53-61.

[2] 王继成，潘金贵，张福炎 .Web 文本挖掘技术研究 [J]. 计算机研究与发展，2000（5）：513-520.

[3] 张骁，周霞，王亚丹 . 中国科技服务业政策的量化与演变——基于扎根理论和文本挖掘分析 [J]. 中国科技论坛，2018（6）： 6-13.

[4] 袁桂霞，周先春 . 基于多媒体信息检索的有监督词袋模型 [J]. 计算机工程与设计，2018，39（9）：2873-2878.

[5] 程斌，高圣国 . 基于细粒度情感的文本挖掘及可视化分析 [J]. 应用数学进展，2021，10（1）：128-136.

作者简介：王睿（2000.09—），女，汉族，安徽亳州人，本科在读，研究方向：大数据算法。

上一篇：关于提升用户数据一致性的方法讨论

下一篇：随机游动中首达概率的研究与分析