ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (ACL 2025 Main)

This repository is the official implementation of ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation .

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Xuanle Zhao*, Xianzhen Luo*, Qi Shi†, Chi Chen†, Shuo Wang, Zhiyuan Liu, Maosong Sun

Notes

We find that our evaluation on ChartMimic utilized the 'no_filter' option previously, which led to performance discrepancies. Upon re-evaluating with the default 'code_pass' setting, we observe the low-level score change to 72.5, while the high-level score remains unchanged.

More information about evaluation

ChartCoder is tested on the new version of Chartmimic, which contains 600 samples. The iclr version of ChartMimic is https://huggingface.co/datasets/ChartMimic/ChartMimic/blob/main/dataset-iclr.tar.gz.
The code we utilize for evaluation is the Supplementary Material of https://openreview.net/forum?id=sGpCzsfd1K.

All the results (including the baseline and our models) in Table 3 in the paper are evaluated based on the above two settings. When conducting the assessment in other settings, there may be performance differences. If you want to replicate the performance in the paper, it is recommended to achieve it under the aforementioned settings.

News

[2025.5.17] ChartCoder has been accepted by ACL 2025 Main.

[2025.3.13] We have upload our dataset Chart2Code-160k(HF) to Huggingface.

[2025.2.19] We have released our dataset Chart2Code-160k to ModelScope.

[2025.1.16] We have updated our data generation code data_generator, built on Multi-modal-Self-instruct. Please follow their instructions and our code to generate the <chart, code> data pairs.

Overview

Installation

Clone this repo

git clone https://github.com/thunlp/ChartCoder.git

Create environment

cd ChartCoder
conda create -n chartcoder python=3.10 -y
conda activate chartcoder
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Additional packages required for training

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Models

Model	Download Link
MLP Connector	projector
ChartCoder	ChartCoder

The MLP Connector is our pre-trained MLP weights, which you could directly use for SFT.

Data

Dataset	Download Link
Chart2Code-160k	HuggingFace
Chart2Code-160k	ModelScope

Train

The whole training process consists of two stages. To train the ChartCoder, siglip-so400m-patch14-384 and deepseek-coder-6.7b-instruct should be downloaded first.

For Pre-training, run

bash scripts/train/pretrain_siglip.sh

For SFT, run

bash scripts/train/finetune_siglip_a4.sh

Please change the model path to your local path. See the corresponding .sh file for details. We also provide other training scripts, such as using CLIP _clip and multiple machines _m. See scripts/train for further information.

Inference

Please see inference.py for details.

Results

Please refer to our paper for detailed performance on ChartMimic, Plot2Code and ChartX benchmarks. Thanks for these contributions to the chart-to-code field.

Contact

For any questions, you can contact 2429527z@gmail.com.

Citation

If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:

@misc{zhao2025chartcoderadvancingmultimodallarge,
      title={ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation}, 
      author={Xuanle Zhao and Xianzhen Luo and Qi Shi and Chi Chen and Shuo Wang and Wanxiang Che and Zhiyuan Liu and Maosong Sun},
      year={2025},
      eprint={2501.06598},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.06598}, 
}

Acknowledgement

The code is based on the LLaVA-NeXT. Thanks for these great works and open sourcing!

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data_generator		data_generator
fig		fig
llava		llava
scripts		scripts
trl		trl
README.md		README.md
cog.yaml		cog.yaml
inference.py		inference.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (ACL 2025 Main)

Notes

News

Overview

Installation

Models

Data

Train

Inference

Results

Contact

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Languages

thunlp/ChartCoder

Folders and files

Latest commit

History

Repository files navigation

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation (ACL 2025 Main)

Notes

News

Overview

Installation

Models

Data

Train

Inference

Results

Contact

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages