摘 要:大数据分析方法很多,通过机器学习构建大数据分析模型进行大数据分析是目前比较有效的方法,大数据特点是数据规模庞大,计算周期长,为了加快计算速度、缩短计算周期,分布式计算方法是解决上述问题行之有效的方法之一。本文介绍了分布式大数据分析模型的构建方法,着重介绍了机器学习算法、分布式计算框架、分布式计算数据处理过程、分布式计算程序设计方法,期望能够为从事大数据分布式计算、大数据分析的研究人员提供一些可借鉴的方法。
关键词:大数据分析;分布式计算;机器学习
中图分类号:TP181 文献标识码:A 文章编号:2096-4706(2018)09-0085-03
Construction Method of Distributed Big Data Analysis Model for Machine Learning
LU Hong
(Beijing Information Technology College,Beijing 100018,China)
Abstract:There are many methods of large data analysis. It is a more effective method to build big data analysis model and analyze big data by machine learning. The large data is characterized by a large scale of data and long computing cycle. In order to speed up the calculation and shorten the calculation period,the distributed computing method is one of the effective methods to solve the above problems. This paper introduces the construction method of distributed large data analysis model,and emphatically introduces machine learning algorithm,distributed computing framework,distributed computing data processing process and distributed computing program design method. It is expected to provide some reference method for researchers who are engaged in large data distributed computing and large data analysis.
Keywords:big data analysis;distributed computation;machine learning
参考文献:
[1] Srynath Perera,Thilina Gunarathne. HadoopMapReduce 实战手册 [M]. 北京:人民邮电出版社,2015.
[2] Donald Miner,Adam Sbook. MapReduce设计模式 [M].北京:人民邮电出版社,2014.
[3] Willi Richert,Luis Pecho Coelho. 机器学习系统设计 [M]. 北京:人民邮电出版社,2014.
作者简介:陆红(1963-),男,北京人,所长,副教授,硕士,研究方向:大数据、人工智能。