摘 要:文章设计一种用于识别企业营业执照图像的算法,其可自动提取统一社会信用代码、公司名称等关键字段信息。以开源 PaddleOCR 框架为基础,通过图像方向自动调整、文本输出结构化、局部二次识别等一系列改进措施,解决了多种图片质量不佳情况下仅通过 PaddleOCR 无法准确识别信息的问题,整体识别准确率提升至 90% 以上,且实现秒级检测。该成果已投入实际使用,辅助前台操作人员快速识别所填写的营业执照信息是否准确,提高人工录入效率。
关键词:PaddleOCR;图像识别;企业营业执照;AI
DOI:10.19850/j.cnki.2096-4706.2021.09.018
中图分类号:TP391.4;TP18 文献标识码:A 文章编号:2096-4706(2021)09-0065-06
Improvement and Practice of Open Source PaddleOCR Technology in Enterprise Business License Recognition
QIU Jianmin
(China Telecom Corporation Limited Jiangsu Branch,Nanjing 210037,China)
Abstract:In this paper,an algorithm for recognizing the enterprise business license image is designed,which can extract automatically the unified social credit code,company name and other key field information. Based on the open source PaddleOCR framework,through a series of improvement measures,such as image orientation automatic adjustment,structured text output,local secondary recognition,the problem that information cannot be accurately recognized only by PaddleOCR under the situation of several kinds of poor image quality is solved,the overall recognition accuracy is improved to more than 90%,and second level detection is realized. This achievement has been put into the actual use to assist the front desk operators to quickly identify whether the business license information filled in is accurate or not,and improve the efficiency of manual entry.
Keywords:PaddleOCR;image recognition;enterprise business license;AI
参考文献:
[1] DU Y N,LI C X,GUO R Y,et al. PP-OCR:A Practical Ultra Lightweight OCR System [J/OL].arXiv:2009.09941 [cs.CV]. (2020-09-21).https://arxiv.org/abs/2009.09941v3.
[2] 邵慧敏 . 营业执照自动识别技术的研究 [D]. 乌鲁木齐: 新疆农业大学,2020.
[3] LIAO M H,WAN Z Y,Yao C,et al. Real-Time Scene Text Detection with Differentiable Binarization [J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(7):11474-11481.
[4] YU D L,Li X,ZHANG C Q,et al. Towards Accurate Scene Text Recognition With Semantic Reasoning Networks[C]//2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). Seattle:IEEE,2020.
[5] LI W,CAO L B,ZHAO D Z,et al. CRNN:Integrating classification rules into neural network[C]//The 2013 International Joint Conference on Neural Networks(IJCNN).Dallas:IEEE,2013.
作者简介:仇建民(1988.06—),男,汉族,江苏扬州人, 中级工程师,本科,研究方向:IT 系统建设与运维、大数据平台、 数据仓库、AI 开发与应用(文本分类、图像识别)等。