Abstract
Medical image segmentation is of significant importance for computer-aided diagnosis. In this task, methods based on Convolutional Neural Networks (CNNs) have shown good performance in extracting local features. However, they cannot capture global dependencies, which is crucial for medical image. On the other hand, Transformer-based methods can establish global dependencies through self-attention, providing a supplement to local convolution. However, the expensive matrix multiplication in the self-attention of a vanilla transformer and the memory usage is still a bottleneck. In this work, we propose a segmentation model named EMF-former. By combining DWConv, channel shuffle and PWConv, we design a Depthwise Separable Shuffled Convolution Module (DSPConv) to reduce the parameter count of convolutions. Additionally, we employ an efficient Vector Aggregation Attention (VAA) that substitutes key-value interactions with element-wise multiplication after broadcasting two vectors to reduce computational complexity. Moreover, we substitute the parallel multi-head attention module with the Serial Multi-Head Attention Module (S-MHA) to reduce feature redundancy and memory usage in multi-head attention. Combining the above modules, EMF-former could perform the medical image segmentation efficiently with fewer parameter counts, lower computational complexity and lower memory usage while preserving segmentation accuracy. We conduct experimental evaluations on ACDC and Hippocampus dataset, achieving mIOU values of 80.5% and 78.8%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, P., et al.: Mh unet: a multi-scale hierarchical based architecture for medical image segmentation. IEEE Access 9, 148384–148408 (2021)
Antonelli, M., et al.: The medical segmentation decathlon. Nature Commun. 13(1), 4128 (2022)
Bernard, O., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, J., et al.: Run, don’t walk: Chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Huang, Z., et al.: Ccnet: Criss-cross attention for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6896–6908 (2023)
Jiao, J., et al.: Dilateformer: multi-scale dilated transformer for visual recognition. IEEE Trans. Multimed. (2023)
Lei, T., Sun, R., Wang, X., Wang, Y., He, X., Nandi, A.: Cit-net: convolutional neural networks hand in hand with vision transformers for medical image segmentation. arXiv preprint arXiv:2306.03373 (2023)
Li, J., Tu, Z., Yang, B., Lyu, M.R., Zhang, T.: Multi-head attention with disagreement regularization. arXiv preprint arXiv:1810.10183 (2018)
Li, X., Jiang, Y., Li, M., Yin, S.: Lightweight attention convolutional neural network for retinal vessel image segmentation. IEEE Trans. Industr. Inf. 17(3), 1958–1967 (2020)
Li, Y., et al.: Efficientformer: vision transformers at mobilenet speed. Adv. Neural. Inf. Process. Syst. 35, 12934–12949 (2022)
Lin, X., Yu, L., Cheng, K.T., Yan, Z.: Batformer: towards boundary-aware lightweight transformer for efficient medical image segmentation. IEEE J. Biomed. Health Inf. (2023)
Lin, Y., Fang, X., Zhang, D., Cheng, K., Chen, H.: Boosting convolution with efficient mlp-permutation for volumetric medical image segmentation (2023)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Lou, A., Guan, S., Loew, M.: Dc-unet: rethinking the u-net architecture with dual channel efficient cnn for medical image segmentation. In: Medical Imaging 2021: Image Processing, vol. 11596, pp. 758–768. SPIE (2021)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Pan, J., et al.: Edgevits: competing light-weight cnns on mobile devices with vision transformers. In: European Conference on Computer Vision. pp. 294–311. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_18
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Ruan, J., Xie, M., Gao, J., Liu, T., Fu, Y.: Ege-unet: an efficient group enhanced unet for skin lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 481–490. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43901-8_46
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Swiftformer: efficient additive attention for transformer-based real-time mobile vision applications. arXiv preprint arXiv:2303.15446 (2023)
Wang, W., et al.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Yuan, L., et al.: Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021)
Zhang, Q., Yang, Y.B.: Rest: An efficient transformer for visual recognition. Adv. Neural. Inf. Process. Syst. 34, 15475–15485 (2021)
Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnformer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
We have no competing interests relevant to the content of this article.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, Z., Quan, H., Lu, Y. (2024). EMF-Former: An Efficient and Memory-Friendly Transformer for Medical Image Segmentation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15008. Springer, Cham. https://doi.org/10.1007/978-3-031-72111-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-72111-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72110-6
Online ISBN: 978-3-031-72111-3
eBook Packages: Computer ScienceComputer Science (R0)