(贵州民族大学 数据科学与信息工程学院,贵州 贵阳 550025)

摘  要:针对深圳市二手房市场房价预测问题,结合相关的八个特征变量,利用随机森林模型训练房价预测模型。在研究过程中为使得模型准确率与泛化能力更高,使用交叉验证与网格搜索法,绘制学习曲线寻找最优参数,最后完成二手房价格预测模型的构建,预测精度达 82.22%。结合相关政策,得出近年来深圳市二手房地产均价虽仍会上涨但总体较为稳定、且涨幅较小,以及近十年小户型的房源增量减少的主要结论。



Prediction and Analysis of the Price of Second-hand House in Shenzhen Based on Random Forest

LI Hanyu, WEI Jiayin, LU Youjun

(School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, China)

Abstract: Aiming at the problem of house price prediction in Shenzhen second-hand house market, combined with eight relevant characteristic variables, the house price prediction model is trained by using random forest model. During the course of research, to improve the accuracy and generalization ability of the model, the cross validation and grid search method are used to draw the learning curve and find the optimal parameters, the construction of the second-hand house price prediction model is completed finally, and the prediction accuracy reaches 82.22%. Combined with relevant policies, it is concluded that although the average price of second-hand real estate in Shenzhen will still rise in recent years, it is generally relatively stable with a small increase, and the increase of house supply of small houses has decreased in recent ten years.

Keywords: web crawler; random forest; Shenzhen second-hand house price; grid search


作者简介:李函谕(1996 -),男,苗族,贵州贵阳人,硕士研究生在读,研究方向:统计建模与分析、大数据处理与分析; 魏嘉银(1986 -),男,汉族,福建三明人,副教授,硕导,博士, 研究方向:算法设计与分析、大数据处理与分析;卢友军(1985 -), 男,汉族,贵州遵义人,硕导,博士,研究方向:复杂网络。