摘 要:文章针对多标签文本分类这一热点问题,采用“预先训练模型 + 微调策略”模式,即研究持续学习语义理解框架ERNIE 2.0 和基于知识蒸馏的压缩模型 ERNIE Tiny 预先训练模型,以及倾斜的三角学习率 STLR 微调策略在用户评论多标签文本数据集中的实践。相对经典语义表征模型 BERT,采用 ERNIE 2.0 模型的效果可提高 1% 以上,采用 ERNIE Tiny 模型的速率可提升 3 倍左右;相对默认微调策略,采用倾斜的三角学习率 STLR 微调策略的效果同样可再提高 1% 左右。
关键词:多标签文本分类;预先训练模型;微调策略;知识蒸馏
DOI:10.19850/j.cnki.2096-4706.2021.17.021
基金项目:校(院)人才队伍建设工程项目 (RS2021-CY04)
中图分类号:TP391.4 文献标识码:A 文章编号:2096-4706(2021)17-0087-05
Research on Multi Label Text Classification of User Comments Based on ERNIE 2.0 Model
MENG Xiaolong1,2
(1.Shanghai Institute of Tourism, Shanghai 201418, China; 2.Shanghai Normal University, Shanghai 201418, China)
Abstract: Aiming at the hotspot issue of multi label text classification, this paper adopts the mode of “pre training model + fine tuning strategy”, that is, to study the continuous learning semantic understanding framework ERNIE 2.0, the compression model ERNIE Tiny pre training model based on knowledge distillation, and the practice of inclined triangular learning rate STLR fine tuning strategy in user comments multi label text data sets. Compared with the classical semantic representation model BERT, the effect of ERNIE 2.0 model can be improved by more than 1%, and the rate of ERNIE Tiny model can be increased by about 3 times; compared with the default fine tuning strategy, the effect of inclined triangular learning rate STLR fine tuning strategy can also be improved by about 1%.
Keywords: multi label text classification; pre training model; fine tuning strategy; knowledge distillation
参考文献:
[1] 肖琳,陈博理,黄鑫,等 . 基于标签语义注意力的多标签文本分类 [J]. 软件学报,2020,31(4):1079-1089.
[2] 谢志炜,冯鸿怀,许锐埼,等 . 电力基建施工问题文本分类研究 [J]. 现代信息科技,2019,3(17):17-19.
[3] 孙明敏 . 基于 GRU-Attention 的中文文本分类 [J]. 现代信息科技,2019,3(3):10-12.
[4] ZHANG M L, ZHOU Z H. Multi-label Neural Networks with Applications to Functional Genomics and Text Categorization [J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10): 1338-1351.
[5] NAM J,KIM J,MENCÍA E L,et al. Large-Scale Multilabel Text Classification — Revisiting Neural Networks [C]//ECML PKDD 2014:Machine Learning and Knowledge Discovery in Databases:Nancy:Springer,2014(8725):437-452.
[6] KURATA G,XIANG B,ZHOU B. Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.San Diego:Association for Computational Linguistics,2016:521-526.
[7] CHEN G B,YE D H,XING Z C,et al.Ensemble application of convolutional and recurrent neural networks for multi-label text categorization [C]//2017 International Joint Conference on Neural Networks(IJCNN).Anchorage:IEEE,2017:2377-2383.
[8] NAM J,MENCÍA E L,KIM H J,et al. Maximizing subset accuracy with recurrent neural networks in multi-label classification [C]//NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc.2017:5419-5429.
[9] YANG P C,SUN X,LI W,et al.SGM:Sequence Generation Model for Multi-label Classification [J/OL]. a r X i v : 1806.04822 [cs.CL].(2018-06-13).https://arxiv.org/abs/1806.04822.
[10] LIN J Y,SU Q,YANG P C,et al.Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification [J/OL]. arXiv:1808.08561 [cs.CL].(2018-8-26).https://arxiv.org/ abs/1808.08561.
[11] YANG Z,LIU G J. Hierarchical Sequence-to-Sequence Model for Multi-Label Text Classification [J].IEEE Access,2019(7): 153012-153020.
[12] HINTON G E,SALAKHUTDINOV RR. Reducing the Dimensionality of Data With Neural Networks [J].Science,2006,313 (5786):504-507.
[13] PETERS M E,NEUMANN M,IYYER M,et al. Deep contextualized word representations [J/OL].arXiv:1802.05365 [cs.CL]. (2018-02-15).https://arxiv.org/abs/1802.05365.
[14] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving Language Understanding by Generative Pre-Training [EB/OL]. [2021-05-20].https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/language-unsupervised/language_understanding_paper.pdf.
[15] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pretraining of Deep Bidirectional Transformers for Language Understanding [J/OL].arXiv:1810.04805 [cs.CL].(2018-10-11).https://arxiv.org/ abs/1810.04805.
[16] ZHANG Z Y,HAN X,LIU Z Y,et al. ERNIE: Enhanced language representation with informative entities [J/OL].arXiv: 1905.07129 [cs.CL].(2019-05-17).https://arxiv.org/abs/1905.07129v1.
[17] SUN Y,WANG S H,LI Y K,et al.ERNIE 2.0:A Continual Pre-training Framework for Language Understanding [J/ OL].rXiv:1907.12412 [cs.CL].(2019-07-29).https://arxiv.org/ abs/1907.12412v2.
[18] SUN S Q,CHENG Y,GAN Z,et al.Patient Knowledge Distillation for BERT Model Compression [J/OL].arXiv:1908.09355 [cs. CL].(2019-08-25).https://arxiv.org/abs/1908.09355v1.
[19] LI Z Y,DING X,LIU T. Story ending prediction by transferable bert [J/OL].arXiv:1905.07504 [cs.CL].(2019-05-17). https://arxiv.org/abs/1905.07504v2.
[20] LIU X D,HE P C,CHEN W Z,et al.Multi-Task Deep Neural Networks for Natural Language Understanding [J/OL].arXiv: 1901.11504 [cs.CL].(2019-01-31).https://arxiv.org/abs/1901.11504v1.
[21] GOYAL P,DOLLÁR P,GIRSHICK R,et al.Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour [J/OL].arXiv: 1706.02677 [cs.CV].(2017-06-08).https://arxiv.org/abs/1706.02677.
[22] HOWARD J,RUDER S.Universal Language Model Finetuning for Text Classification [J/OL].arXiv:1801.06146 [cs.CL]. (2018-01-18).https://arxiv.org/abs/1801.06146v5.
[23] 周志华 . 机器学习 [M]. 北京:清华大学出版社,2016: 33-35.
作者简介:孟晓龙(1988—),男,汉族,上海人,讲师,硕士学历,主要研究方向:数据挖掘与机器学习。