EMF-Former: An Efficient and Memory-Friendly Transformer for Medical Image Segmentation

Hao, Zhaoquan; Quan, Hongyan; Lu, Yinbin

doi:10.1007/978-3-031-72111-3_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15008))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

2010 Accesses

Abstract

Medical image segmentation is of significant importance for computer-aided diagnosis. In this task, methods based on Convolutional Neural Networks (CNNs) have shown good performance in extracting local features. However, they cannot capture global dependencies, which is crucial for medical image. On the other hand, Transformer-based methods can establish global dependencies through self-attention, providing a supplement to local convolution. However, the expensive matrix multiplication in the self-attention of a vanilla transformer and the memory usage is still a bottleneck. In this work, we propose a segmentation model named EMF-former. By combining DWConv, channel shuffle and PWConv, we design a Depthwise Separable Shuffled Convolution Module (DSPConv) to reduce the parameter count of convolutions. Additionally, we employ an efficient Vector Aggregation Attention (VAA) that substitutes key-value interactions with element-wise multiplication after broadcasting two vectors to reduce computational complexity. Moreover, we substitute the parallel multi-head attention module with the Serial Multi-Head Attention Module (S-MHA) to reduce feature redundancy and memory usage in multi-head attention. Combining the above modules, EMF-former could perform the medical image segmentation efficiently with fewer parameter counts, lower computational complexity and lower memory usage while preserving segmentation accuracy. We conduct experimental evaluations on ACDC and Hippocampus dataset, achieving mIOU values of 80.5% and 78.8%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 83.88; Price includes VAT (France)

Softcover Book: EUR 103.38; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ConTrans: Improving Transformer with Convolutional Attention for Medical Image Segmentation

MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation

Article Open access 11 February 2025

Perspective $$+$$ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields

References

Ahmad, P., et al.: Mh unet: a multi-scale hierarchical based architecture for medical image segmentation. IEEE Access 9, 148384–148408 (2021)
Article Google Scholar
Antonelli, M., et al.: The medical segmentation decathlon. Nature Commun. 13(1), 4128 (2022)
Article Google Scholar
Bernard, O., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Article Google Scholar
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, J., et al.: Run, don’t walk: Chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Huang, Z., et al.: Ccnet: Criss-cross attention for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6896–6908 (2023)
Article Google Scholar
Jiao, J., et al.: Dilateformer: multi-scale dilated transformer for visual recognition. IEEE Trans. Multimed. (2023)
Google Scholar
Lei, T., Sun, R., Wang, X., Wang, Y., He, X., Nandi, A.: Cit-net: convolutional neural networks hand in hand with vision transformers for medical image segmentation. arXiv preprint arXiv:2306.03373 (2023)
Li, J., Tu, Z., Yang, B., Lyu, M.R., Zhang, T.: Multi-head attention with disagreement regularization. arXiv preprint arXiv:1810.10183 (2018)
Li, X., Jiang, Y., Li, M., Yin, S.: Lightweight attention convolutional neural network for retinal vessel image segmentation. IEEE Trans. Industr. Inf. 17(3), 1958–1967 (2020)
Article Google Scholar
Li, Y., et al.: Efficientformer: vision transformers at mobilenet speed. Adv. Neural. Inf. Process. Syst. 35, 12934–12949 (2022)
Google Scholar
Lin, X., Yu, L., Cheng, K.T., Yan, Z.: Batformer: towards boundary-aware lightweight transformer for efficient medical image segmentation. IEEE J. Biomed. Health Inf. (2023)
Google Scholar
Lin, Y., Fang, X., Zhang, D., Cheng, K., Chen, H.: Boosting convolution with efficient mlp-permutation for volumetric medical image segmentation (2023)
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar
Lou, A., Guan, S., Loew, M.: Dc-unet: rethinking the u-net architecture with dual channel efficient cnn for medical image segmentation. In: Medical Imaging 2021: Image Processing, vol. 11596, pp. 758–768. SPIE (2021)
Google Scholar
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Chapter Google Scholar
Pan, J., et al.: Edgevits: competing light-weight cnns on mobile devices with vision transformers. In: European Conference on Computer Vision. pp. 294–311. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_18
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ruan, J., Xie, M., Gao, J., Liu, T., Fu, Y.: Ege-unet: an efficient group enhanced unet for skin lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 481–490. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43901-8_46
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Swiftformer: efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446 (2023)
Wang, W., et al.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)
Article Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Yuan, L., et al.: Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021)
Google Scholar
Zhang, Q., Yang, Y.B.: Rest: An efficient transformer for visual recognition. Adv. Neural. Inf. Process. Syst. 34, 15475–15485 (2021)
Google Scholar
Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnformer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, East China Normal University, Shanghai, China
Zhaoquan Hao, Hongyan Quan & Yinbin Lu

Authors

Zhaoquan Hao
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Quan
View author publications
You can also search for this author in PubMed Google Scholar
Yinbin Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyan Quan .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

We have no competing interests relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1584 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Quan, H., Lu, Y. (2024). EMF-Former: An Efficient and Memory-Friendly Transformer for Medical Image Segmentation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15008. Springer, Cham. https://doi.org/10.1007/978-3-031-72111-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-72111-3_22
Published: 06 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72110-6
Online ISBN: 978-3-031-72111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

EMF-Former: An Efficient and Memory-Friendly Transformer for Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ConTrans: Improving Transformer with Convolutional Attention for Medical Image Segmentation

MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation

Perspective $$+$$ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 1584 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

EMF-Former: An Efficient and Memory-Friendly Transformer for Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ConTrans: Improving Transformer with Convolutional Attention for Medical Image Segmentation

MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation

Perspective $$+$$ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 1584 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation