摘 要:词向量作为自然语言处理的基础技术,随着大数据和深度神经网络的发展,其算法也随之得到了更好的发展,尤其是近些年来各类新式算法和思想层出不穷,使得自然语言处理的准确度得到极大的提升。在阐述各个词向量算法的同时,穿插例子和图表,使大众更加清晰透彻理解算法的过程和优缺点。通过对词向量算法的发展进行整体的回顾,加深对词向量的理解,在解决问题的前提下为正确选用哪种词向量而做出更好的判断。
关键词:词向量;独热编码;向量空间模型;静态词向量;动态词向量
DOI:10.19850/j.cnki.2096-4706.2021.05.008
中图分类号:TP391 文献标识码:A 文章编号:2096-4706(2021)05-0036-04
Research on the Development Stages of Word Embedding Algorithm
LI Mengning
(School of Statistics,University of International Business and Economics,Beijing 100029,China)
Abstract:As a fundamental technology of natural language processing(NLP),word embedding’s algorithm has gained better development with the development of big data and deep neural networks. Especially in recent years,all kinds of new algorithms and ideas emerge in an endless stream,which makes the accuracy of NLP get a huge improvement. It introduces samples and charts while elaborating each word embedding algorithm,and makes the public understand the process and advantages and disadvantages of the algorithm more clearly. Through the overall review of the development of word embedding algorithm,we can deepen the understanding of word embedding,and make a better judgment for selecting which word embedding correctly under the premise of solving problems.
Keywords:word embedding;one-hot encoding;vector space model;static word embedding;dynamic word embedding
参考文献:
[1] MIKOLOV T,YIH W T,ZWEIG G. Linguistic Regularities in Continuous Space Word Representations [C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACLHLT-2013).Association for Computational Linguistics,2013:746-751.
[2] DEVLIN J,CHANG M W,LEE K,et al. BERT:Pretraining of Deep Bidirectional Transformers for Language Understanding [J/OL].arXiv:1810.04805v1 [cs.CL].(2018-10-11).https://arxiv.org/ abs/1810.04805v1.
[3] RONG X. word2vec Parameter Learning Explained [J/ OL].arXiv:1411.2738v4 [cs.CL].(2014-11-11).https://arxiv.org/ abs/1411.2738.
[4] LEVY O,GOLDBERG Y. Neural word embedding as implicit matrix factorization [C]//NIPS’14:Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2177-2185.
[5] 严红 . 词向量发展综述 [J]. 现代计算机(专业版),2019 (8):50-52.
[6] 刘胜杰,许亮 . 基于词嵌入技术的文本表示研究现状综述 [J]. 现代计算机,2020(1):40-43.
作者简介:李孟宁(1991.11—),男,汉族,河北石家庄人, 研究生在读,研究方向:自然语言处理。