Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

Yuhang Han^1, Xuyang Liu^2, Zihan Zhang³, Pengxiang Ding⁴, Donglin Wang⁴,
Honggang Chen², Qingsen Yan¹, Siteng Huang^5✉

¹Northwestern Polytechnical University, ²Sichuan University,
³Johns Hopkins University, ⁴Westlake University, ⁵Zhejiang University

🔥 News

2025.01.10 🤗🤗 We release our latest work GlobalCom², a "global-to-local" approach for training-free acceleration of high-resolution MLLMs. Code is available!
2024.11.17 🤗🤗 We release our work FiCoCo which proposes a unified paradigm to demystify the popular works and guide the future designs of training-free token reduction for MLLMs.

👀 Overview

TLDR: This study introduces a unified "filter-correlate-compress" paradigm to streamline training-free token reduction in Multimodal Large Language Models (MLLMs), achieving up to 82.4% FLOPs reduction with minimal performance impact and outperforming existing methods across 10 benchmarks.

🛠 Preparation

Clone this repository.

git clone https://github.com/kawhiiiileo/FiCoCo.git
cd FiCoCo

Environment Setup and Preparation

 conda create -n FiCoCo python=3.10 -y
 conda activate FiCoCo
 pip install -e .

Download Multimodal Benchmark

Please follow the detailed instruction in LLaVA-Evaluation.

Download LLaVA and put them under ./liuhaotian/llava-v1.5-7b.

🚀 Run and Evaluation

To configure the FiCoCo model with these parameters, update the corresponding settings in your code or configuration file. Below is an example configuration:

For example:
merge_visual: true # Enable FiCoCo-V for visual tokens compression
AT: true # Enable FiCoCo-L for visual tokens compression
r: 42 # Compress 42 tokens per layer
control_encoding_layer: 11 # Start compression from the 12th transformer layer

Example for evaluating SQA results (r=42, control_encoding_layer=11, merge_visual=True):

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/sqa.sh

To calculate the theoretical computational efficiency shown above, we recommend the methodology presented in the work of LLM-Viewer. We deeply appreciate their outstanding contribution to this field.

🚀 Exploring Without CLS Token

Considering that some MLLM visual encoders do not involve a [CLS] token, we propose a feasible alternative. The specific results are as follows, and further details can be found in the paper.

📌 Citation

If you use FiCoCo in your research, please cite our work by using the following BibTeX entry:

@misc{han2025filtercorrelatecompresstrainingfree,
      title={Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration}, 
      author={Yuhang Han and Xuyang Liu and Zihan Zhang and Pengxiang Ding and Donglin Wang and Honggang Chen and Qingsen Yan and Siteng Huang},
      year={2025},
      eprint={2411.17686},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17686}, 
}

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA, ToMe and Open-LLaVA-NeXT.

📧 Contact

For any question about our paper or code, please email yuhangh984@gmail.com or liuxuyang@stu.scu.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ipynb_checkpoints		.ipynb_checkpoints
llava		llava
scripts		scripts
README.md		README.md
cog.yaml		cog.yaml
eff_ana.png		eff_ana.png
efficiency_analysis.jpg		efficiency_analysis.jpg
fig1.png		fig1.png
intro.png		intro.png
main.png		main.png
main_new.png		main_new.png
pyproject.toml		pyproject.toml
visual.png		visual.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

Yuhang Han^1, Xuyang Liu^2, Zihan Zhang³, Pengxiang Ding⁴, Donglin Wang⁴,
Honggang Chen², Qingsen Yan¹, Siteng Huang^5✉

¹Northwestern Polytechnical University, ²Sichuan University,
³Johns Hopkins University, ⁴Westlake University, ⁵Zhejiang University

🔥 News

👀 Overview

🛠 Preparation

🚀 Run and Evaluation

🚀 Exploring Without CLS Token

📌 Citation

👍 Acknowledgment

📧 Contact

About

Releases

Packages

Contributors 2

Languages

kawhiiiileo/FiCoCo

Folders and files

Latest commit

History

Repository files navigation

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

Yuhang Han1* , Xuyang Liu2*, Zihan Zhang3, Pengxiang Ding4, Donglin Wang4, Honggang Chen2, Qingsen Yan1, Siteng Huang5✉ 1Northwestern Polytechnical University, 2Sichuan University, 3Johns Hopkins University, 4Westlake University, 5Zhejiang University

🔥 News

👀 Overview

🛠 Preparation

🚀 Run and Evaluation

🚀 Exploring Without CLS Token

📌 Citation

👍 Acknowledgment

📧 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Yuhang Han^1, Xuyang Liu^2, Zihan Zhang³, Pengxiang Ding⁴, Donglin Wang⁴,
Honggang Chen², Qingsen Yan¹, Siteng Huang^5✉

¹Northwestern Polytechnical University, ²Sichuan University,
³Johns Hopkins University, ⁴Westlake University, ⁵Zhejiang University

Packages