ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".
Xiangyuan Xue
Zeyu Lu
Di Huang
Zidong Wang
Wanli Ouyang
Lei Bai*
(a) ComfyBench is a comprehensive benchmark to evaluate agents's ability to design collaborative AI systems in ComfyUI. Given the task instruction, agents are required to learn from documents and create workflows to describe collaborative AI systems. The performance is measured by pass rate and resolve rate, reflecting whether the workflow can be correctly executed and whether the task requirements are realized. (b) ComfyAgent builds collaborative Al systems in ComfyUI by generating workflows. The workflows are converted into equivalent code so that LLMs can better understand them. ComfyAgent can learn from existing workflows and autonomously design new ones. The generated workflows can be interpreted as collaborative AI systems to complete given tasks.
- [2024/11/20] The latest version of our code is updated.
- [2024/11/14] The work is further extended and renamed as ComfyBench.
- [2024/09/04] Our code implementation is released on GitHub.
- [2024/09/02] The initial version of our paper is submitted to arXiv.
First, clone the repository and navigate to the project directory:
git clone https://github.com/xxyQwQ/ComfyBench
cd ComfyBench
Then, create a new conda environment and install the dependencies:
conda create -n comfybench python=3.12
conda activate comfybench
pip install -r requirements.txt
Finally, modify config.yaml
to set your ComfyUI server and API key. Feel free to change proxies and models if necessary.
Despite some models can be automatically installed, other models need to be manually downloaded and placed in the specific directory. You may find them by yourself on Hugging Face or directly download from our Cloud Drive. Besides, we provide a list of extensions in assets/extension.md
so that you can install them manually. You can verify the completeness with the workflows in dataset/benchmark/workflow
.
Run the following commands to execute the ComfyAgent pipeline:
# activate the conda environment
conda activate comfybench
# execute the main script
# see `main.py` for more parameter settings
python main.py \
--instruction "task-instruction" \
--agent_name "comfy" \
--save_path "path/to/save/result"
The log file together with the workflow will be saved in the specified path. If your ComfyUI server is working properly, the workflow will be executed automatically to produce the result.
ComfyBench is provided under dataset/benchmark
. The document
folder contains documentation for 3205 nodes, where meta.json
records the metadata of each node. The workflow
folder contains 20 curriculum workflows for agents to learn from. The instruction
provides all the tasks in ComfyBench, where complete.json
contains 200 task instructions and sample.json
contains a subset of 10 task instructions for validation.
Before evaluating on ComfyBench, you should copy the resource files in dataset/benchmark/resource
into the input folder of ComfyUI, so that ComfyUI can load them during the workflow execution. Then you can evaluate the specific agent by running the following commands. Here we take ComfyAgent as an example.
# activate the conda environment and set up environment variables
conda activate comfybench
export PYTHONPATH=./
# execute the inference script
# see `script/inference.py` for more parameter settings
python script/inference.py \
--agent_name "comfy" \
--save_path "cache/benchmark/comfy"
# execute the evaluation script
# see `script/evaluation.py` for more parameter settings
python script/evaluation.py \
--submit_folder "cache/benchmark/comfy/workflow" \
--cache_path "cache/benchmark/comfy/outcome"
In this example, the log files and generated workflows will be saved in cache/benchmark/comfy/logging
and cache/benchmark/comfy/workflow
, respectively. The produced results will be saved in cache/benchmark/comfy/outcome
, together with a result.json
recording whether each task is passed and resolved, and a summary.txt
summarizing the overall metrics.
Here are some examples of the results produced by ComfyAgent on ComfyBench. Visit our Project Page for more details.