摘 要:视频监控场景下车辆年款信息抽取对城市数智化治理有着重要意义。为实现细粒度车辆年款的精准识别,首先,构建了覆盖多元采集条件及常见车辆年款的百万级场景数据集;其次,提出了基于 Transformer 的车辆年款细粒度特征高效提取器;最后,结合任务特点设计了层次标签多任务联合学习方法,获得兼容全局与局部的高鲁棒性特征。实验结果表明,提出的方法在场景数据集上的 Top-1 准确率达到 95.79%,相较基于 CNN 的单任务方法有大幅提升。
关键词:视频监控;车辆年款识别;细粒度分类;vision transformer
DOI:10.19850/j.cnki.2096-4706.2023.01.020
基金项目:广州市科技计划项目(202206030001)
中图分类号:TP391.4 文献标识码:A 文章编号:2096-4706(2023)01-0075-05
Research on Fine-Grained Recognition of Vehicle Model Year Based on Transformer
XU Tianshi, WEN Li, ZHANG Huajun
(GRGBanking Equipment Co., Ltd., Guangzhou 510663, China)
Abstract: Vehicle model year information extraction in video surveillance scenes is of great significance for urban digital intelligent governance. In order to achieve accurate identification of fine-grained vehicle model year, firstly, a mega scene dataset covering multiple collection conditions and common vehicle model year is constructed; secondly, an efficient fine-grained feature extractor of vehicle model year based on Transformer is proposed; finally, a hierarchical label multi task joint learning method is designed based on task characteristics to obtain high robustness features compatible with global and local features. The experimental results show that the Top-1 accuracy of the proposed method on the scene dataset reaches 95.79%, which is significantly improved compared with the single task method based on CNNs.
Keywords: video surveillance; vehicle model year recognition; fine-grained classification; vision transformer
参考文献:
[1] XIANG L D,WANG X Y. Vehicle classification algorithm based on DCNN features and ensemble learning [J/OL].[2022-08-18].http://en.cnki.com.cn/Article_en/CJFDTotal-SJSJ202006020.htm.
[2] CSURKA G,DANCE C R,FAN L X,et al. Visual categorization with bags of keypoints [C]//Workshop on statistical learning in computer vision, ECCV. 2004, 1(1-22): 1-2.[2022-08-18].https://www.researchgate. net/publication/228602850_Visual_categorization_with_bags_of_keypoints.
[3] LIU X C,LIU W,MA HD,et al. Large-scale vehicle reidentification in urban surveillance videos [C]//2016 IEEE international conference on multimedia and expo (ICME).Seattle:IEEE,2016:1-6.
[4] HE K M,ZHANG X Y,REN SQ,et al. Deep residual learning for image recognition [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778.
[5] SANDLER M,HOWARD A,ZHU M L,e t a l . Mobilenetv2: Inverted residuals and linear bottlenecks [C]//2018 IEEE/ CVFConference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:4510-4520.
[6] DING X H,ZHANG X Y,MA N N,et al. RepVGG: Making VGG-style ConvNetsGreat Again [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville:IEEE,2021:13733-13742.
[7] HE T,ZHANG Z,ZHANG H,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beach:IEEE,2019:558-567.
[8] WANG H Y,TANG J,SHEN Z H,et al. Multitask FineGrained Vehicle Identification Based on Deep Convolutional Neural Networks [J].Journal of Graphics,2018,39(3):485-492.
[9] YANG J,CAO H Y,WANG R G,et al. Fine-grained car recognition method based on region proposal networks [J].Journal of Image and Graphics,2018,23(6):837-845.
[10] TOUVRON H,CORD M,DOUZE M,et al. Training dataefficient image transformers & distillation through attention [EB/OL]. [2022-08-19].https://www.xueshufan.com/publication/3170874841.
[11] DOSOVITSKIY A,BEYER L,KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL].[2022-08-06].https://www.xueshufan.com/ publication/3119786062.
[12] LIU Z,LIN Y T,CAO Y,et al. Swin transformer: Hierarchical vision transformer using shifted windows [C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal:IEEE,2021:10012-10022.
[13] KRAUSE J,STARK M,DENG J,et al. 3d object representations for fine-grained categorization [C]//2013 IEEE international conference on computer vision workshops.Sydney: IEEE,2013:554-561.
[14] YANG L J,LUO P,LOY C C,et al. A large-scale car dataset for fine-grained categorization and verification [C]//2015 IEEE conference on computer vision and pattern recognition(CVPR). Boston:IEEE,2015:3973-3981.
[15] DONG Z,WU Y W,PEI M T,et al. Vehicle type classification using a semisupervised convolutional neural network [J]. IEEE transactions on intelligent transportation systems,2015,16(4): 2247-2256.
[16] GUO H Y,ZHAO C Y,LIU Z W,et al. Learning coarseto-fine structured feature embedding for vehicle re-identification [EB/ OL].[2022-08-08].https://dl.acm.org/doi/abs/10.5555/3504035.3504874.
[17] LIU H Y,TIAN Y H,WANG Y W,et al. Deep relative distance learning: Tell the difference between similar vehicles [C]//2016 IEEE conference on computer vision and pattern recognition(CVPR). Las Vegas:IEEE,2016:2167-2175.
[18] ZHONG Z,ZHENG L,KANG G L,et al. Random erasing data augmentation [J/OL].arXiv:1708.04896 [cs.CV].[2022-08-02]. https://arxiv.org/abs/1708.04896.
[19] DENG J,DONG W,SOCHER R,et al. Imagenet: A large-scale hierarchical image database [C]//2009 IEEE conference on computer vision and pattern recognition.Miami:IEEE,2009:248-255.
[20] BOCHKOVSKIY A,WANG C Y,LIAO H Y M. Yolov4: Optimal Speed and Accuracy of Object Detection [J/ OL].arXiv:2004.10934 [cs.CV][2022-08-08].https://arxiv.org/ abs/2004.10934.
作者简介:徐天适(1990—),男,汉族,江西瑞昌人,技术经理,硕士研究生,研究方向:计算机视觉、人工智能系统。