当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术22年6期

基于注意力机制的多任务汉语关键词识别
何振华¹,胡恒博¹,金鑫² ,安达²,李静涛¹
(1. 郑州信大先进技术研究院,河南 郑州 450000;2. 中国铁路北京局集团有限公司,北京 100036)

摘  要:为了提高语音关键词识别的性能,在无自动语音识别的端到端关键词识别模型的基础上,使用了软注意力机制并结合多任务训练的方式对其进行了改进。改进后的基于注意力机制的关键词识别模型由四部分构成,关键词嵌入模块和声学模块使用软注意力来得到特征向量,判别器模块和分类器模块输入特征向量来进行关键词识别。实验结果表明,改进后模型的准确率分别比基线模型和传统的关键词检索方法高出 37.3% 和 3.1%。


关键词:关键词识别;注意力机制;多任务训练



DOI:10.19850/j.cnki.2096-4706.2022.06.020


课题项目:中国铁路北京局集团有限公司科技研究开发计划课题(2021AY02)


中图分类号:TP183                                       文献标识码:A                                    文章编号:2096-4706(2022)06-0082-05


Keyword Recognition of Multi-Task Chinese Based on Attention Mechanism

HE Zhenhua 1, HU Hengbo1, JIN Xin2, AN Da 2, LI Jingtao1

(1.Zhengzhou Xinda Institute of Advanced Technology, Zhengzhou 450000, China; 2.China Railway Beijing Group Co., Ltd., Beijing 100036, China)

Abstract: In order to improve the performance of speech sounds keyword recognition, this paper uses the method of soft-attention mechanism and combines multi-task training method to improve it based on the end-to-end keyword recognition model without automatic speech sounds recognition. The improved keyword recognition model based on attention mechanism consists of four parts. Keyword embedded modules and acoustic modules use soft attention to obtain the feature vectors, and the discriminator modules and classifier modules input the feature vectors for keyword recognition. Experimental results show that the accuracy of the improved model is 37.3% and 3.1% higher than the baseline model and the traditional keyword retrieval methods respectively.

Keywords: keyword recognition; attention mechanism; multi-task training


参考文献:

[1] YANH K,HEQ H,XIEW.Crnn-CtcBased Mandarin Keywords Spotting [C]//ICASSP2020-2020IEEEInternationalConferen ceonAcoustics,SpeechandSignalProcessing(ICASSP).Barcelona: IEEE,2020:7489-7493.

[2] MANDAL A,KUMAR K R P,MITRA P. Recentdevelopme ntsinspokentermdetection:asurvey [J].InternationalJournalofSpeechTec hnology,2014,17:183-198.

[3] CHENGG,PARADAC,HEIGOLDG.Small-footprintkeywo rdspottingusingdeepneuralnetworks [C]//2014IEEEInternationalConfere nceonAcoustics,SpeechandSignalProcessing(ICASSP).Florence: IEEE,2014:4087-4091.

[4] DEANDRADEDC,LEOS,VIANAMLDS,et al.Aneuralat tentionmodelforspeechcommandrecognition [J/OL].arXiv:1808.08929 [eess.AS].[2021-12-24].https://arxiv.org/abs/1808.08929.

[5] SAINATHT N,PARADAC. Convolutionalneuralnetworksfor small-footprintkeywordspotting [EB/OL].[2021-12-24].https://download. csdn.net/download/weixin_42601421/10691683?utm_source=iteye_new. 

[6] ARIKSÖ,KLIEGL M,CHILD R,etal. ConvolutionalRe currentNeuralNetworksforSmall-FootprintKeywordSpotting [EB/OL]. [2021-12-24].https://www.isca-speech.org/archive/interspeech_2017/ ark17_interspeech.html.

[7] GRAVES A,FERNÁNDEZ S,GOMEZ F,etal. Connectio nistTemporalClassification:LabellingUnsegmentedSequenceDataWith RecurrentNeuralNetworks [C]//Proceedingsofthe23rdinternationalconfer enceonMachinelearning. Pittsburgh:[s.n.],2006:369-376.

[8] AUDHKHASI K,ROSENBERG A,SETHY A,etal. Endto-End ASR-Free Keyword Search From Speech [J/OL].IEEEJournalofS electedTopicsinSignalProcessing,2017,11(8):1351-1359. 

[9] BALDIP. Autoencoders,unsupervisedlearning, anddeeparchitectures [C].Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop. Washington:JMLR.org,2011,27:37-50.

[10] CHUNGY A,WUCC,SHENC H,et al.AudioWord2Vec: UnsupervisedLearningofAudioSegmentRepresentationsusingSequenceto-sequenceAutoencoder [J/OL].arXiv:1603.00982[cs.SD].[2021-12- 24].https://doi.org/10.48550/arXiv.1603.00982.

[11] KIMY,JERNITEY,SONTAGD,et al. CharacterAwareNeuralLanguageModels [J/OL].arXiv:1508.06615 [cs.CL].[2021- 12-24].https://arxiv.org/abs/1508.06615.


作者简介:何振华(1983—),男,汉族,河南郑州人,中级工程师,本科,研究方向:语音识别、机器翻译;胡恒博(1994—),男,汉族,河南郑州人,硕士研究生在读,研究方向:语音识别、语音关键词识别。