当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术22年2期

BERT 的图模型文本摘要生成方法研究
黄菲菲
(河南财经政法大学,河南 郑州 450046)

摘  要:基于图模型的 TextRank 方法形成的摘要不会脱离文档本身,但在抽取文本特征的时候,传统的词向量获取方法存在一词多义的问题,而基于 BERT 的词向量获取方式,充分挖掘了文本语义信息,缓解了一词多义问题。对不同词嵌入方法进行了实验对比,验证了 BERT 模型的有效性。基于词频统计的相似度计算方法也忽略了句子的语义信息,文中选择了向量形式的相似度的计算方法用于文本摘要生成。最后在 TTNews 数据集上做实验,效果有了明显的提升。


关键词:中文文本摘要;BERT;TextRank;相似度



DOI:10.19850/j.cnki.2096-4706.2022.02.023


基金项目:青年科学基金项目(61806073);河南省科技攻关项目(222102210339)


中图分类号:TP 391                                      文献标识码:A                                   文章编号:2096-4706(2022)02-0091-06


Research on Text Summarization Generation Method of Graph Model Based on BERT

HUANG Feifei

(Henan University of Economics and Law, Zhengzhou 450046, China)

Abstract: The abstract formed by TextRank method based on graph model will not be separated from the document itself, but when extracting text features, the traditional word vector acquisition method has the problem of polysemy, while the word vector acquisition method based on BERT fully excavates the semantic information of the text and alleviates the problem of polysemy. The experimental comparison of different word embedding methods verifies the effectiveness of the BERT model. The similarity calculation method based on word frequency statistics also ignores the semantic information of sentences. In this paper, the similarity calculation method in vector form is selected for text abstract generation. Finally, the experiment on TTNews data set shows that the effect is obviously improved.

Keywords: abstract of Chinese text; BERT; TextRank; similarity


参考文献:

[1] MIHALCEA R,TARAU P. TextRank:Bringing Order into Texts [EB/OL].[2021-11-12].https://digital.library.unt.edu/ark:/67531/metadc30962/m1/1/.

[2] 胡侠,林晔,王灿,等 . 自动文本摘要技术综述 [J]. 情报杂志,2010,29(8):144-147.

[3] LUHNHP. The automatic creation of literature astracts [J].IBM Journal of Research and Development,1958,2(2):159-165.

[4] KUPIEC J,PEDERSEN J O,CHEN F. A trainable document summarizer [C]//18th annual international ACM SIGIR conference on Research and development in information retrieval.New York:Association for Computing Machinery,1995:68-73.

[5] AONE C,OKUROWSKI M E,GORLINSKY J,et al. A rainable summarizer with knowledge acquired from robust NLP techniques [M]//INDERJEET M,MARK M T.Advances in Automatic Text Summarization,Cambridge:The Mit Press,1999:71-80.

[6] SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval [J].Information Processing & Managem-ent,1988,24(5):513-523.

[7] CONROY J M,OLEARY D P. Text summarization via hidden Markov models [C]//SIGIR ‘01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,New Orleans:Association for Computing Machinery,2001:406-407.

[8] ERKAN G,RADEV D R. LexRank:Graph-based Lexical Centrality as Salience in TextSummarization [J/OL].arXiv1109.2128[cs. CL].[2021-11-23].https://arxiv.org/abs/1109.2128.

[9] BAHDANAU D,CHO K,BENGIO Y. Neural Machine Translation by Jointly Learning to Align and Translate [J/OL].arXiv:1409.0473 [cs.CL].(2014-09-01).https://arxiv.org/abs/1409.0473,2014.

[10] RUSH A M,CHOPRA S,WESTON J. A Neural Attention Model for Abstractive Sentence Summarization [C]//Proceddings of the 2015Conference on Empirical Methods in  Natural Language Processing.Lisbon:Association for Computational Linguistics,2015:379-389.

[11] CHOPRA S,AULI M,RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks [C]//Proceddings of the Annual Conference of the North American Chapter of the Association for Computional Linguistics:Human Language Technologies.San Diego:Association for Computational Linguistics,2016:93-98.

[12] NALLAPATI R,ZHOU B W,SANTOS CND,et al. Abstractive Text Summarization Using Sequence-to-sequence RNNS and Beyond [C]//Proceddings of the 20thSIGNLL Conference on Computational Natural Language Learning.Berlin:Association for Computational Linguistics,2016:280-290.

[13] ABADI M,BARHAM P,CHEN J,et al. Tensor Flow:Asystem for large-scale machine learning [C]//The Processing of the 12th USENLX Symposium on Operating Systems Design and Implementation.Savannah:USENIX Association,2016:265-283.

[14] DEVLIN J,CHANG M W,LEE K,et al. BERT:Pretraining of Deep Bidirectional Transformers for Language Understanding [J/OL].arXiv:1810.04805 [cs.CL].[2021-11-23].https://arxiv.org/abs/1810.04805v1.

[15] 王侃,曹开臣,徐畅,等 . 基于改进 Transformer 模型的文本摘要生成方法 [J]. 电讯技术,2019,59(10):1175-1181.

[16] Cjayz. 文本相似度算法研究 [EB/OL].[2021-11-23].https://www.docin.com/p-2221663292.html.

[17] MIHALCEA R,TARAU P. TextRank:Bringing Order into Texts [C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Barcelona:Association for Computational Linguistics,2004:404-411.


作者简介:黄菲菲(1995—),女,汉族,河南商丘人,硕士在读,研究方向:自然语言处理。