摘 要:拆分大数据文档是日常生活中所需的,随着大数据文档的增加,选择拆分行数是一个值得研究的问题。运用PyCharm Community 和 Python 来拆分大文档,对比分析在不同行数的条件下,大文档拆分成小文档所用的数量以及时间。报告了拆分数量适中是最佳的,文档数据量越大,消耗的时间越不稳定。通过对同一文档拆分最短时间的拆分行数进行研究,得到拆分时间规律,选择最佳的拆分行数,以此提高拆分文档效率。
中图分类号:TP309 文献标识码:A 文章编号:2096-4706(2022)06-0107-03
Research on Big Data Document Splitting Rule Based on Python
DING Sirong, HE Jingru, LI Zhen
(Chengdu Jincheng College, Chengdu 611731, China)
Abstract: Splitting big data documents is necessary in daily life. With the increase of big data documents, choosing the number of splitting rows is a problem worthy of study. This paper uses the PyCharm Community and Python to split large documents, compares and analyzes the quantity and time used to split large documents into small documents under the condition of different numbers of rows. It is reported that a moderate number of splitting rows is the best, and the larger the amount of document data, the more unstable the time consumed. By studying the number of splitting rows in the shortest time of splitting the same document, the rule of splitting time is obtained, and the best number of splitting rows is selected to improve the efficiency of splitting the document.
Keywords: splitting big data document; comparative analysis; the number of splitting rows
