ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Han, Yucheng; Zhang, Chi; Chen, Xin; Yang, Xu; Wang, Zhibin; Yu, Gang; Fu, Bin; Zhang, Hanwang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.16483 (cs)

[Submitted on 27 Nov 2023]

Title:ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Authors:Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

View PDF

Abstract:Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tuning dataset leveraging GPT-4. We develop a multi-step data generation process in which different steps are responsible for generating tabular data, creating chart figures, and designing instruction tuning data separately. Our method's flexibility enables us to generate diverse, high-quality instruction-tuning data consistently and efficiently while maintaining a low resource expenditure. Additionally, it allows us to incorporate a wider variety of chart and task types not yet featured in existing datasets. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset. ChartLlama outperforms all prior methods in ChartQA, Chart-to-text, and Chart-extraction evaluation benchmarks. Additionally, ChartLlama significantly improves upon the baseline in our specially compiled chart dataset, which includes new chart and task types. The results of ChartLlama confirm the value and huge potential of our proposed data generation method in enhancing chart comprehension.

Comments:	Code and model on this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2311.16483 [cs.CV]
	(or arXiv:2311.16483v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.16483

Submission history

From: Yucheng Han [view email]
[v1] Mon, 27 Nov 2023 15:20:23 UTC (3,004 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators