当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术21年23期

基于注意力机制的全卷积神经网络模型
刘孟轩,张蕊,曾志远,金玮,武益超
(华北水利水电大学,河南 郑州 450046)

摘  要:全卷积神经网络 FCN-8S 在进行多尺度特征融合时,由于未能考虑不同尺度特征各自的特点进行充分融合,导致分割结果精度较低,针对这一问题,文章提出了一种基于注意力机制的多尺度特征融合的全卷积神经网络模型。该模型基于注意力机制对 FCN-8S 中的不同尺度特征进行加权特征融合,以相互补充不同尺度特征包含的不同信息,进而提升网络的分割效果。文章模型在公共数据集 PASCAL VOC2012 和 Cityscapes 上进行验证,MIoU 相对于 FCN-8S 分别提升了 2.2% 和 0.8%。


关键词:语义分割;全卷积神经网络;注意力机制;特征融合



DOI:10.19850/j.cnki.2096-4706.2021.23.024


基金项目:河南省科技攻关项目(192102210265, 202102210141)


中图分类号:TP391.4                                  文献标识码:A                                   文章编号:2096-4706(2021)23-0092-04


Full Convolutional Neural Network Model Based on Attention Mechanism

LIU Mengxuan, ZHANG Rui, ZENG Zhiyuan, JIN Wei, WU Yichao

(North China University of Water Resources and Electric Power, Zhengzhou, 450046, China)

Abstract: Aiming at the problem of low accuracy of segmentation results due to the failure to consider the respective characteristics of different scale features when the fully convolutional neural network FCN-8S performs multi-scale feature fusion, this paper proposes a fully convolutional neural network model with multi-scale feature fusion based on attention mechanism. This model is based on the attention mechanism to perform weighted feature fusion of different scale features in FCN-8S to complement each other with different information contained in different scale features, thereby improving the segmentation effect of the network. The model proposed in this paper is verified on the public data sets PASCAL VOC2012 and Cityscapes. Compared with FCN-8S, MIoU increases by 2.2% and 0.8%, respectively.

Keywords: semantic segmentation; full convolutional neural network; attention mechanism; feature fusion


参考文献:

[1] 田萱,王亮,丁琪 . 基于深度学习的图像语义分割方法综述 [J]. 软件学报,2019,30(2):440-468.

[2] HINTON G E,SALAKHUTDINOV R R. Reducing the Dimensionality of Data with NeuralNnetworks [J].Science,2006,313 (5786):504-507.

[3] LONG J,SHELHAMER E,DARRELL T. Fully Convolutional Networks for Semantic Segmentation [C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston:IEEE,2015:3431-3440.

[4] SIMONYAN K,ZISSERMAN A. Very Deep Convolutional Networks for large-scale image Recognition [J/OL].arXiv:1409.1556 [cs. CV].[2021-11-13].https://arxiv.org/abs/1409.1556.

[5] KRIZHEVSKY A,SUTSKEVER I,HINTON G E. Imagenet Classification with Deep Convolutional Neural Networks [EB/ OL].[2021-11-13].https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289gwinter2018/alexnet.pdf.

[6] EVERINGHAM M,ESLAMI S M A,VAN GOOL L, et al. The pascal visual Object Classes challenge: A Retrospective [J]. International Journal of Computer Vision,2015,111:98-136.

[7] CORDTS M,OMRAN M,RAMOS S,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas:IEEE,2016:3213-3223.

[8] RONNEBERGER O,FISCHER P,BROX T. U-net: Convolutional Networks for Biomedical image Segmentation [J/OL]. arXiv:1505.04597 [cs.CV].[2021-11-13].https://arxiv.org/abs/1505.04597.

[9] SUN K,XIAO B,LIU D,et al. Deep High-Resolution Representation Learning for Human Pose Estimation [C]//2019 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach:IEEE,2019:5686-5696.

[10] GUO M H,XU T X,LIU J J,et al. Attention Mechanisms in Computer Vision:A Survey [J/OL].arXiv:2111.07624 [cs.CV]. [2021-11-13].https://arxiv.org/abs/2111.07624.

[11] HU J,SHEN L,ALBANIE S,et al. Squeeze-andExcitation Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence.2020,42(8):2011-2023.

[12] HU J,SHEN L,ALBANIE S,et al. Gather-excite: Exploiting Feature Context in Convolutional Neural Networks [J/ OL].arXiv:1810.12348 [cs.CV].[2021-11-13].https://arxiv.org/ abs/1810.12348.

[13] WOO S,PARK J,LEE J Y,et al. CBAM: Convolutional Block Attention Module [C]//Computer Vision–ECCV 2018.Munich: view affiliations,2018:3-19.

[14] PARK J,WOO S,LEE J Y,et al. Bam:Bottleneck Attention module [J/OL].arXiv:1807.06514 [cs.CV].[2021-11-13]. https://arxiv.org/abs/1807.06514.

[15] HARIHARAN B,ARBELÁEZ P,BOURDEV L,et al. Semantic contours from inverse detectors [C]//2011 International Conference on Computer Vision.Barcelona:IEEE,2011:991-998.


作者简介:刘孟轩(1997—),男,汉族,河南洛阳人,硕士研究生在读,研究方向:图像语义分割;张蕊(1980—),女,汉族, 河南濮阳人,硕士生导师,博士,研究方向: 图像处理、三维场景语义分割、激光雷达点云数据处理;曾志远(1997—),男,汉族, 河南驻马店人,硕士研究生在读,研究方向:图像语义分割;金玮 (1996—),男,汉族,河南周口人,硕士研究生在读,研究方向: 图像处理;武益超(1999—),男,汉族,河南安阳人,硕士研究生 在读,研究方向:点云语义分割。