A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision

Choi, Jiwon; Jo, Wooyoung; Hong, Seongyon; Kwon, Beomseok; Park, Wonhoon; Yoo, Hoi-Jun

doi:10.1109/ISCAS58744.2024.10558026

Computer Science > Hardware Architecture

arXiv:2403.04982 (cs)

[Submitted on 8 Mar 2024 (v1), last revised 14 Mar 2024 (this version, v3)]

Title:A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision

Authors:Jiwon Choi, Wooyoung Jo, Seongyon Hong, Beomseok Kwon, Wonhoon Park, Hoi-Jun Yoo

View PDF HTML (experimental)

Abstract:This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augmentation (PSSA) to reduce external memory access (EMA) energy of self-attention score by 60.3 %, leading to 37.8 % total EMA energy reduction. 2) Text-based important pixel spotting (TIPS) to allow 44.8 % of the FFN layer workload to be processed with low-precision activation. 3) Dual-mode bit-slice core (DBSC) architecture to enhance energy efficiency in FFN layers by 43.0 %. The proposed processor is implemented in 28 nm CMOS technology and achieves 3.84 TOPS peak throughput with 225.6 mW average power consumption. In sum, 28.6 mJ/iteration highly energy-efficient text-to-image generation processor can be achieved at MS-COCO dataset.

Comments:	Accepted at 2024 IEEE International Symposium on Circuits and Systems (ISCAS)
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2403.04982 [cs.AR]
	(or arXiv:2403.04982v3 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2403.04982
Related DOI:	https://doi.org/10.1109/ISCAS58744.2024.10558026

Submission history

From: Jiwon Choi [view email]
[v1] Fri, 8 Mar 2024 01:41:02 UTC (19,071 KB)
[v2] Mon, 11 Mar 2024 01:29:38 UTC (19,071 KB)
[v3] Thu, 14 Mar 2024 20:34:43 UTC (16,687 KB)

Computer Science > Hardware Architecture

Title:A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators