摘 要:方言的辨别可为案件侦破提供重要线索,本文针对贵州方言辨别提出一种有效的方言辨识模型,从贵州省6 个地区采集时长不等的语音样本,提取梅尔频率倒谱系数MFCC,然后利用多级二维离散小波变换提取MFCC 中的低频分量同时进行压缩,然后采用滑窗进行信息重叠分块,对每块进行奇异值分解并保留高贡献率的特征向量,把分块合并后转换成一个3 维矩阵作为方言辨识模型的输入数据。先对卷积神经网络进行改进,然后构建方言辨识模型,并采用交叉实验对该模型进行训练和验证,从而对二维离散小波变换的级数和滑窗的宽度进行优化。实验结果证明该模型对贵州方言辨识是高效的。
关键词:汉语方言辨识;梅尔频率倒谱系数;二维离散小波变换;奇异值分解;卷积神经网络
中图法分类号:TP391.4 文献标志码:A 文章编号:2096-4706(2019)01-0005-06
Identification of Guizhou Dialect Based on Improved Convolutional Neural Network
AI Hu1,LI Fei2
(1.Department of Criminal Technology,Guizhou Police College,Guiyang 550005,China;
2.The Education University of Hong Kong,Hong Kong 999077,China)
Abstract:Chinese dialect identification may provide an important clue for forensic investigation. This paper has proposed an effective dialect identification model for Guizhou dialect identification. The authors extracted Mel frequency cepstral coefficients (MFCC) from speech samples of different time lengths collected from six regions in Guizhou province,then extracted low-frequency components in MFCC with multi-stage two-dimensional discrete wavelet transform (2-DWT) for compression,and then used the sliding window to conduct information overlapping blocking. The singular value of each block was decomposed and high contribution rate feature vectors were retained,and the blocks were combined and converted into a 3-dimensional matrix as the input data of the dialect identification model. Firstly,the convolutional neural network (CNN) is improved,then a dialect identification model is constructed,and the model is trained and verified by adopting a cross experiment,so that the stages of the two-dimensional discrete wavelet transform and the width of the sliding window are optimized. The experimental results show that the model is efficient for Guizhou dialect identification.
Keywords:Chinese dialect identification;mel frequency cepstrum coefficients;two-dimensional discrete wavelet transform;singular value decomposition;convolutional neural network
参考文献:
[ 1 ] B A K E R W,E D D I N G T O N D,N AY L . D I A L E C T IDENTIFICATION:THE EFFECTS OF REGION OF ORIGION AND AMOUNT OF EXPERIENCE [J].American Speech,2009,84(1):48-71.
[2] ALAM MJ,KINNUNEN T,KENNY P,et al. Multitaper MFCC and PLP features for speaker verification using i-vectors [J]. Speech Communication,2013,55(2):237-251.
[3] BURGET L,MATEJKA P,CERNOCKY J. Discriminative Training Techniques for Acoustic Language Identification [C]// Acoustics,Speech and Signal Processing,2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. S.l.:s.n.,2006,I:209-212.
[4] TSAI W H,CHANG W W. Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification [J]. Speech Communication,2002,36(3-4):317-326.
[5] BAHARI MH.Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition [J]. Audio,Speech,and Language Processing,IEEE/ACM Transactions on,2014,22(7):1117-1129.
[6] Yun Lei,HANSEN JHL. Factor analysis-based information integration for Arabic dialect identification [C]// Acoustics,Speech and Signal Processing,2009.ICASSP 2009. IEEE International Conference on Acoustics,2009:4337-4340.
[7] DEHAK N,KENNY P J,DEHAK R,et al. Front-end factor analysis for speaker verification [J]. IEEE Transactions on Audio, Speech and Language Processing,2011,19(4):788-798.
[8] Dehak N,Torres-Carrasquillo P A,Reynolds D A,et al. Language Recognition via Ivectors and Dimensionality Reduction [C]// Proceedings of Conference of the International Speech Communication Association,Florence,Italy,August,2011:857-860.
[9] PUCHER M,SCHABUS D,YAMAGISHI J ,et al. Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis [J]. Speech Communication,2010,52(2):164-179.
[10] Omar F.Zaidan,Chris Callison-Burch. Arabic Dialect Identification [J]. Computational Linguistics,2013,40(1):171-202.
[11] Andrew Hunt. Recurrent neural networks for syllabification [J]. Speech Communication,1993,13(3-4):323-332.
[12] Priyanka Singh,Priti Singh,Rakesh Kumar Sharma. JPEG Image Compression based on Biorthogonal,Coiflets and Daubechies Wavelet Families [J]. International Journal of Computer Applications,2011,13(1):1-7.
[13] Rafael C. Gonzalez Richard E,Woods. Digital Image Processing(3rd Edition) [M]. Beijing:Publishing House of Electronics Industry,2007:306-312.
[14] Press W H,Flannery B P,Teukolsky S A,et al. Numerical recipes in C:the art of scientific computing [M]. Cambridge:Cambridge University Press,1988.
[15] LECUN Y,BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE,1998,8(11):2278-2324.
[16] TURAGA C S,MURRAY F J,JAIN V,et al. Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation [J]. Neural Computation,2010,22(2):511-538.
[17] Yangyan Li,Hao Su,Charles Ruizhongtai Qi,et al. Joint embeddings of shapes and images via CNN image purification [J]. ACM Transactions on Graphics (TOG),2015,34(6):1-12.
[18] Li G,Yu Y. Visual Saliency Detection Based on Multiscale Deep CNN Features [J]. IEEE Transactions on Image Processing,2016,25(11):5012-5024.
作者简介:艾虎(1974-),男,江西弋阳人,博士,副教授, 研究方向:声音与图像。