摘 要:目标检测是计算机视觉领域三大任务之一,同时也是计算机视觉领域内一个最基本和具有挑战性的热点课题,近一年来基于 Transformer 的目标检测算法研究引发热潮。简述 Transformer 框架在目标检测领域的研究状况,介绍了其基本原理、常用数据集和常用评价方法,并用多种公共数据集对不同算法进行对比以分析其优缺点,在综述研究基础上,结合行业应用对基于 Transformer 的目标检测进行总结与展望。
关键词:目标检测;Transformer;计算机视觉;深度学习
DOI:10.19850/j.cnki.2096-4706.2021.07.004
基金项目:广东省自然科学基金面上项目 (2021A1515011605)
中图分类号:TP391 文献标识码:A 文章编号:2096-4706(2021)07-0014-04
A Summary of Research on Target Detection Based on Transformer
YIN Hang,FAN Wenting
(College of Information Science and Technology,Zhongkai University of Agriculture and Engineering,Guangzhou 510225,China)
Abstract:Target detection is one of the three major tasks in the field of computer vision. At the same time,it is also a basic and challenging hot topic in the field of computer vision. In almost a year,the research of object detection algorithms based on Transformer has caused a boom. This paper sketches the research status of Transformer framework in the field of target detection,introduces it’s basic principle,common data sets and common evaluation methods,and compares different algorithms with several public data sets,so as to analyze their advantages and disadvantages. On the basis of summarizing the research,also combined the industry application,this paper summarizes and prospects of the object detection based on Transformer.
Keywords:target detection;Transformer;computer vision;deep learning
参考文献:
[1] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need [C]//Advances in Neural Information Processing Systems. Long Beach,2017:5998-6008
[2] PARMAR N,VASWANI A,USZKOREIT J,et al. Image Transformer [J/OL].arXiv:1802.05751 [cs.CV].(2018-02-15).https:// arxiv.org/abs/1802.05751.
[3] CARION N,MASSA F,SYNNAEVE G,et al. End-to-End Object Detection with Transformers [M].Switzerland:Springer,2020.
[4] ZHU X Z,SU W J,LU L W,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection [J/OL]. arXiv:2010.04159 [cs.CV].(2020-10-18).https://arxiv.org/ abs/2010.04159.
[5] ZHENG M H,GAO P,WANG X G,et al. End-toEnd Object Detection with Adaptive Clustering Transformer [J/ OL].arXiv:2011.09315 [cs.CV].(2020-11-18).https://arxiv.org/ abs/2011.09315v1.
[6] LIU L,OUYANG W L,WANG X G,et al. Deep Learning for Generic Object Detection:A Survey [J]. International Journal of Computer Vision,2020,128:261–318.
[7] DAI J F,QI H Z,XIONG Y W,et al. Deformable Convolutional Networks [C]//2017 IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:764-773.
[8] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al. An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale [J/OL].arXiv:2010.11929 [cs.CV].(2020-10-22).https://arxiv. org/abs/2010.11929.
[9] BEAL J,KIM E,TZENG E,et al. Toward TransformerBased Object Detection [J/OL].arXiv:2012.09958 [cs.CV].(2020-12- 17).https://arxiv.org/abs/2012.09958.
作者简介:尹航(1978—),男,汉族,山东东明人,副教授, 博士,研究方向:机器学习。