摘 要:当前LSTMP 是基于LSTM 增加了Projection 层,并将这个层连接到LSTM 的输入,通过循环连接投影层,对高维度的信息进行降维,减小细胞单元的维度,从而减小相关参数矩阵的参数数目。但LSTMP 网络结构的缺点在于Projection 层的输出需要完成两个功能,既需要充当历史信息,又需要作为下一层的输入。针对以上问题,笔者提出了一种Re-dimension 的方法,让网络自己选择一部分参数作为历史信息,并获得了一定程度的提升。采用该方法后,能提高语音识别率相对4-5% 左右。
中图分类号:TN912.34 文献标识码:A 文章编号:2096-4706(2019)11-0019-03
Research and Improvement of Speech Recognition Method Based on LSTMP
SUN Youyu,SUN Baoshan,LU Yang
(School of Computer Science and Technology,Tianjin Polytechnic University,Tianjin 300387,China)
Abstract:Currently,LSTMP is based on LSTM,which adds a project layer and connects this layer to the input of LSTM. Bycircularly connecting the projection layer,it reduces the dimension of high-dimensional information,reduces the dimension of cell units,and thus reduces the number of parameters of the related parameter matrix. However,the disadvantage of LSTMP network structure is thatthe output of the Projection layer needs to complete two functions,which need to act as both historical information and input of the nextlayer. In view of the above problems,the author proposes a Re-dimension method,which allows the network to select some parameters ashistorical information,and has achieved a certain degree of improvement. With this method,the speech recognition rate can be improvedby about 4-5%.
Keywords:LSTM for long-term and short-term memory;dimensionality reduction;speech recognition
[1] 戴礼荣,张仕良,黄智颖. 基于深度学习的语音识别技术现状与展望 [J]. 数据采集与处理,2017,32(2):221-231.
[2] 陈晓宇. 基于数据驱动的涡扇发动机故障预测研究 [D].阜新:辽宁工程技术大学,2018.
[3] 李杰. 基于深度学习的语音识别声学模型建模方法研究 [D]. 北京:中国科学院大学,2016.
[4] 胡鑫,程玉柱,吴祎,等. 长短期记忆网络的林火图像分割方法 [J]. 中国农机化学报,2019,40(1):103-107.
[5] 沈旭东. 基于深度学习的时间序列算法综述 [J]. 信息技术与信息化,2019(1):71-76.
[6] Peddinti V,Wang Y,Povey D,et al. Low Latency AcousticModeling Using Temporal Convolution and LSTMs [J].IEEE SignalProcessing Letters,2017(99):1.
[7] Chan W,Jaitly N,Le Q,et al. Listen,attend and spell:A neural network for large vocabulary conversational speech recognition [C]// IEEE International Conference on Acoustics,Speech and SignalProcessing. IEEE,2016:4960-4964.
[8] R. Prabhavalkar,T. N. Sainath,et al. Minimum Word ErrorRate Training for Attention-based Sequence-to-sequence Models [J].IEEE Conference on Acoustics,Speech,and Signal Processing(ICASSP),2018.