当前位置>主页 > 期刊在线 > 信息技术 >

信息技术21年18期

基于 MapReduce 框架的一种并行大数据算法的研究
冯占伟
(黑龙江外国语学院,黑龙江 哈尔滨 150025)

摘  要:为了在社交媒体数据中找出相应内容,可以通过大数据挖掘的算法对社交媒体数据进行处理。文章提出了一种处理 Twitter 数据挖掘的大数据算法,为保证可扩展性,基于 MapReduce 框架提出并行数据挖掘的大数据算法。通过实验证明了该算法是高效的,在计算上,尽管数据集大小增加,执行速度仍然可以显著增加,并且加速比随着数据集大小的增加和数据节点数量的增加而增大。


关键词:社会媒体;数据挖掘;大数据算法;推特数据;MapReduce



DOI:10.19850/j.cnki.2096-4706.2021.18.009


中图分类号:TP311                                   文献标识码:A                                   文章编号:2096-4706(2021)18-0031-04


Research on a Parallel Big Data Algorithm Based on MapReduce Framework

FENG Zhanwei

(Heilongjiang International University, Haerbin 150025, China)

Abstract: In order to find out the corresponding content in the social media data, the social media data can be processed through the algorithm of big data mining. This paper proposes a big data algorithm for Twitter data mining. In order to ensure the scalability, we propose a parallel big data algorithm based on MapReduce framework. Experimental results show that the algorithm is efficient. Although the size of the data set increases, the execution speed can still significantly increase, and the speedup ratio increases with the increase of the size of the data set and the number of data nodes.

Keywords: social media; data mining; big data algorithm; twitter data; MapReduce


参考文献:

[1] KAPLAN A M,HAENLEIN M. Users of the world, unite! The challenges and opportunities of Social Media [J].Business Horizons,2010,53(1):59-68.

[2] WANG F Y,CARLEY K M,ZENG D,et al. Social Computing:From Social Informatics to Social Intelligence [J].IEEE Intelligent Systems,2007,22(2),79-83.

[3] ULICNY B,KOKAR M,MATHEUS C. Metrics for monitoring a social political blogosphere:A Malaysian case study [J]. IEEE Internet Computing,2010,14(2),34-44.

[4] JOSHI M,DAS D,GIMPEL K,et al. Movie Reviews and Revenues:An Experiment in Text Regression [C]//HLT'10 Human Language Technologies:The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.Los Angeles:Association for Computational Linguistics,2010:293-296.

[5] BOLLEN J,MAO H N,ZENG X J. Twitter mood predicts the stock market [J]. Journal of Computational Science,2011,2(1): 1-8.

[6] ZHU F D,SUN H,YAN X F. Network mining and analysis for social applications [C]//KDD'14:Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.New York:Association for Computing Machinery,2014.

[7] LI J,QIN Q M,HAN J W,et al. Mining Trajectory Data and Geotagged Data in Social Media for Road Map Inference [J/OL]. Transactions in GIS,2014,19(1):1-18.[2021-06-16].https:// onlinelibrary.wiley.com/doi/10.1111/tgis.12072.

[8] BACCIANELLA S,ESULI A,S E B A S T I A N I F. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining [J/OL].LREC,2010,10: 2200-2204.[2021-06-16].http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.180.4108.

[9] CHAN K C C,WONG A K C,CHIU D K Y. Learning Sequential Patterns for Probabilistic Inductive Prediction [C]//IEEE Transactions on Systems,Man,and Cybernetics,1994,24(10): 1532-1547.


作者简介:冯占伟(1981—),男,汉族,黑龙江巴彦人,讲师,硕士,研究方向:软件工程。