Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection

Gao, Wenting; Li, Xiaojuan; Han, Yu; Liu, Yue

doi:10.1007/978-3-031-15937-4_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13532))

Included in the following conference series:

International Conference on Artificial Neural Networks

2152 Accesses

Abstract

Scale imbalance is one of the primary limitations for object detection. To tackle such a problem, existing methods such as FPN usually integrate the features at different scales, which suffers from the inconsistence of different high-level and low-level features due to the straightforward combination. In this paper, we propose a multi-scale vertical cross-layer feature aggregation and attention fusion network which not only has bottom-up and top-down pathways with lateral connections, but also adds cross-layer paths in the vertical direction. The proposed method can boost information flow and shorten the information path between high-level and low-level features. An attention fusion module is also introduced to obtain the internal correlation between local, global and contextual information of other feature layers. In order to optimize the anchor configurations, a differential evolution algorithm is employed to reconfigure the ratios and scales of anchors. Experimental results show that the proposed method achieves superior detection performance on the public dataset PASCAL VOC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 42.79; Price includes VAT (France)

Softcover Book: EUR 52.74; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stacked Pyramid Attention Network for Object Detection

Article 07 April 2021

AG-YOLO: Attention-guided network for real-time object detection

Article 04 September 2023

Cross-Layer Feature Attention Module for Multi-scale Object Detection

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1–9 (2015)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29, 1–9 (2016)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Google Scholar
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Google Scholar
Zlocha, M., Dou, Q., Glocker, B.: Improving RetinaNet for CT lesion detection with dense masks from weak RECIST labels. In: Shen, D., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. LNCS, vol. 11769, pp. 402–410. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_45
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the Key-Area Research and Development Program of Guangdong Province (No.2019B010149001) and the National Natural Science Foundation of China (No. 61960206007, No. 61731003) and the 111 Project (B18005).

Author information

Authors and Affiliations

Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
Wenting Gao, Xiaojuan Li, Yu Han & Yue Liu

Authors

Wenting Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Han
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Liu .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W., Li, X., Han, Y., Liu, Y. (2022). Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-15937-4_12
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stacked Pyramid Attention Network for Object Detection

AG-YOLO: Attention-guided network for real-time object detection

Cross-Layer Feature Attention Module for Multi-scale Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stacked Pyramid Attention Network for Object Detection

AG-YOLO: Attention-guided network for real-time object detection

Cross-Layer Feature Attention Module for Multi-scale Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation