当前位置>主页 > 期刊在线 > 信息技术 >

信息技术2020年1期

一种针对中国移动客服文本的分词方法
钟建,高海洋
(中国移动通信集团四川有限公司,四川 成都 610041)

摘  要:为提升客户服务的效率,快速分析和解决客户问题,并将客户述求和投诉充分转换为中国移动发展的动力和资源;提出了一种针对移动客服聊天记录的数据分词框架,针对客服聊天文本的特点,制定了结合文本纠错、停用词扩充、关键词提取、词性分析这几个方面的数据预处理步骤。依靠这样的框架,提升了文本数据分词的质量,使用字典映射的方式,纠正出文本数据中存在的共性的错误。


关键词:数据预处理;停用词;关键词;纠错字典



中图分类号:TP391.1        文献标识码:A         文章编号:2096-4706(2020)01-0007-03


A Segmentation Method for Customer Service Texts of China Mobile

ZHONG Jian,GAO Haiyang

(China Mobile Group Sichuan Co.,Ltd.,Chengdu 610041,China)

Abstract:In order to improve the efficiency of customer service,quickly analyze and solve customer problems,and fully convert customer complaints into the power and resources of China Mobile’s development. We propose a data segmentation framework for mobile customer service chat record. According to the characteristics of customer service chat text,we develop the data preprocessing steps of text error correction,stop words expansion,keyword extraction,part of speech analysis. Relying on this framework,we improve the quality of text data segmentation. We use dictionary mapping to correct the common errors in the text data.

Keywords:data preprocessing;stop words;keywords;error correction dictionary


参考文献:

[1] WANG Y,ZHENG X,HOU D,et al. Short text sentiment classification of high dimensional hybrid feature based on SVM [J].Computer Technology and Development,2018,28(2):88-93.

[2] DEVLIN J,CHANG M,LEE K,et al. BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding [J/OL].[2019-11-20].https://arxiv.org/abs/1810.04805?context=cs.

[3] YANG Y,XIE P,TAO J,et alAlibaba at IJCNLP-2017 Task 1:Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task [C]//Proceedings of the IJCNLP 2017,Shared Tasks,2017:41-46.

[4] WRIGHT R E. Logistic regression [J].Reading & Understanding Multivariate Statistics,1995,68(3):497-507.


作者简介:钟建(1969-),男,汉族,四川成都人,高级工程师,硕士研究生,研究方向:移动网络的建设维护和优化。