摘 要:验证码是目前大部分网站用来防止批量注册、刷票等恶意操作的图灵测试手段。为了帮助开发者设计出更缜密的安全策略,将生成的 5 000 张混合英数的不定长验证码作为训练集,基于 PyTorch 学习框架,使用卷积循环神经网络来训练模型,CTC 算法对标签进行对齐,实现对不定长验证码的识别。实验表明,最终该模型对不定长验证码的识别准确率达到了 99.2%。
关键词:验证码识别;卷积神经网络;循环神经网络;卷积循环神经网络
DOI:10.19850/j.cnki.2096-4706.2021.07.034
中图分类号:TP391.41 文献标识码:A 文章编号:2096-4706(2021)07-0133-03
Undefined Length CAPTCHA Recognition Based on Convolutional Recurrent Neural Network
LI Qiuyu
(College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117,China)
Abstract:CAPTCHA is a Turing test method used by most websites to prevent batch registration,ticket brushing and other malicious operations. In order to help developers design more rigorous security strategies,this paper uses generated 5 000 pieces of mixed English and number undefined length CAPTCHAs as the training set,based on PyTorch learning framework,uses convolutional recurrent neural network to train the model,and the CTC algorithm aligns the labels to realize the recognition of undefined length CAPTCHAs. Experiments show that the recognition accuracy of the model for undefined length CAPTCHAs reaches 99.2%.
Keywords:CAPTCHA recognition;convolutional neural network;recurrent neural network;convolutional recurrent neural network
参考文献:
[1] 殷光,陶亮 . 一种 SVM 验证码识别算法 [J]. 计算机工程 与应用,2011,47(18):188-190+194.
[2] 王晓鹏 . 验证码识别系统的研究及实现 [D]. 广州:华南 理工大学,2010.
[3] SHI B G,BAI X,YAO C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304.
[4] HE K M,ZHANG X Y,REN S Q,et al. Deep Residual Learning for Image Recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas, NV:IEEE,2016:770-778.
[5] HOCHREITER S,SCHMIDHUBER J. Long Short-Term Memory [J].Neural Computation,1997,9(8):1735-1780.
作者简介:黎秋宇(2000—),男,汉族,福建龙岩人,本科 在读,研究方向:深度学习。