Abstract
Since image-wise supervised labels can be obtained effortlessly, weakly supervised semantic segmentation has made great achievements in recent years. Most advanced methods use the class activation maps (CAMs) to generate the initial localization map. However, the CAMs generated by the convolutional neural network only contain the discriminative parts of the object, and it is not sufficient for segmenting the image. In this paper, we propose an effective two-stage weakly supervised semantic segmentation method called SRANet with sub-category exploration network (SEN) and self-correlation module (SCM). It enriches the object information and adjusts the generated CAM by applying improved sub-category task and second-order self-supervision mechanism. Specifically, we perform clustering on features to obtain the sub-category pseudolabels, which are employed to generate high qualitative CAMs. Then, we design a self-attention module to further improve the quality of the response map. The extensive experiments results with some state-of-the-art methods show that the proposed SRANet model can achieve 71.5% and 72.8% mIoU on the PASCAL VOC 2012 training set and testing set, respectively. It also obtains 44.6% mIoU on MS COCO 2014 dataset, which is the best performance in all comparable algorithms. All these verify the performance and superiority of the proposed model.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Ahn J, Cho S, Kwak S (2019) Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2209–2218
Ahn J, Kwak S (2018) Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4981–4990
Aneja J, Deshpande A, Schwing AG (2018) Convolutional image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5561–5570
Azzopardi G, Strisciuglio N, Vento M, Petkov N (2015) Trainable cosfire filters for vessel delineation with application to retinal images. Med Image Anal 19(1):46–57
Bearman A, Russakovsky O, Ferrari V, Fei-Fei L (2016) What’s the point: Semantic segmentation with point supervision. In: European conference on computer vision, pp 549–565. Springer
Chang YT, Wang Q, Hung WC, Piramuthu R, Tsai YH, Yang MH (2020) Weakly-supervised semantic segmentation via sub-category exploration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8991–9000
Chen K, Chan PP, Xiang T, Kees N, Yeung DS (2021) Class-specific affinity based weakly supervised semantic segmentation with neutral region exploration. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen X, Lian Y, Jiao L, Wang H, Gao Y, Lingling S (2020) Supervised edge attention network for accurate image instance segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, pp 617–631. Springer
Cheng J, Liu J, Xu Y, Yin F, Wong DWK, Tan NM, Tao D, Cheng CY, Aung T, Wong TY (2013) Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. IEEE Trans Med Imaging 32(6):1019–1032
Cherukuri V, Ssenyonga P, Warf BC, Kulkarni AV, Monga V, Schiff SJ (2017) Learning based segmentation of ct brain images: application to postoperative hydrocephalic scans. IEEE Trans Biomed Eng 65(8):1871–1884
Dai J, He K, Sun J (2015) Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1635–1643
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee
Fan J, Zhang Z, Tan T, Song C, Xiao J (2020) Cian: Cross-image affinity net for weakly supervised semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp 10762–10769
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 International Conference on Computer Vision, pp 991–998. IEEE
Hou Q, Jiang PT, Wei Y, Cheng MM (2018) Self-erasing network for integral object attention. arXiv preprint arXiv:1810.09821
Huang Z, Wang X, Wang J, Liu W, Wang J (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7014–7023
Jiang PT, Han LH, Hou Q, Cheng MM, Wei Y (2021) Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence
Jiang PT, Yang Y, Hou Q, Wei Y (2022) L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16886–16896
Kaneko AM, Yamamoto K (2016) Landmark recognition based on image characterization by segmentation points for autonomous driving. In: 2016 SICE International Symposium on Control Systems (ISCS), pp 1–8. https://doi.org/10.1109/SICEISCS.2016.7470160
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kolesnikov A, Lampert CH (2016) Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: European conference on computer vision, pp 695–711. Springer
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. Adv Neural Inf Process Syst 24:109–117
Lee J, Kim E, Lee S, Lee J, Yoon S (2019) Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5267–5276
Lee J, Oh SJ, Yun S, Choe J, Kim E, Yoon S (2022) Weakly supervised semantic segmentation using out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16897–16906
Lee S, Lee M, Lee J, Shim H (2021) Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5495–5505
Li J, Fan J, Zhang Z (2022) Towards noiseless object contours for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16856–16865
Lin D, Dai J, Jia J, He K, Sun J (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3159–3167
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755. Springer
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Pan X, Gao Y, Lin Z, Tang F, Dong W, Yuan H, Huang F, Xu C (2021) Unveiling the potential of structure preserving for weakly supervised object localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11642–11651
Song TH, Sanchez V, Eidaly H, Rajpoot NM (2017) Dual-channel active contour model for megakaryocytic cell segmentation in bone marrow trephine histology images. IEEE Trans Biomed Eng 64(12):2913–2923
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Vernaza P, Chandraker M (2017) Learning random-walk label propagation for weakly-supervised semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7158–7166
Wang C, Bai X, Wang S, Zhou J, Ren P (2018) Multiscale visual attention networks for object detection in vhr remote sensing images. IEEE Geosci Remote Sens Lett 16(2):310–314
Wang X, You S, Li X, Ma H (2018) Weakly-supervised semantic segmentation by iteratively mining common object features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1354–1362
Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12275–12284
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1568–1576
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Xie J, Hou X, Ye K, Shen L (2022) Clims: Cross language image matching for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4483–4492
Yang K, Hu X, Fang Y, Wang K, Stiefelhagen R (2020) Omnisupervised omnidirectional semantic segmentation. IEEE Transactions on Intelligent Transportation Systems
Yao Q, Gong X (2020) Saliency guided self-attention network for weakly and semi-supervised semantic segmentation. IEEE Access 8:14413–14423
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhou T, Zhang M, Zhao F, Li J (2022) Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4299–4309
Acknowledgements
This study was funded by the Natural Science Foundation of Liaoning Province (NO. 2020-MS-080), the National Key Research and Development Program of China (NO. 2017YFF0108800).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, H., Geng, T., Wang, J. et al. Improved sub-category exploration and attention hybrid network for weakly supervised semantic segmentation. Neural Comput & Applic 35, 10573–10587 (2023). https://doi.org/10.1007/s00521-023-08250-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08250-4