NOTE: The project has moved to https://github.com/mrcabbage972/GitChameleonBenchmark. Please use that link for everything related to the GitChameleon benchmark.

[OUTDATED] GitChameleon: A Benchmark for Version-Conditioned Code Generation

Benchmark associated with the paper "GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models"

We thank Terry Zhuo and the BigCodeBench project (https://github.com/bigcode-project/bigcodebench) for providing a starting point for our codebase.

Downloading the Dataset

The dataset used in our benchmark is available in CSV format at data/combined_dataset.csv.

Setting Up the Environment

Create a Python 3.10 Environment:
- (optional) Use conda to create the environment:
```
conda create -n GitChameleon python=3.10
```
- Install the required packages:
```
pip install vllm -r requirements.txt
```

Note: vllm-cpu (experimental): The requirements.txt will install vllm with gpu spport. For vllm-cpu, please follow the instructions in the official documentation. This has not been tested end-to-end with this repository, so it may break. It is planned to be fully supported in the near future.

Prepare Virtual Environments for Evaluation:
- Run the following script to install the virtual environments with the necessary Python libraries:
```
python src/create_venvs.py
```
This step sets up the specific library versions required for evaluation using code execution criteria.

Running Generations and Evaluations

Main Scripts:
- generate.py: Runs the model to generate outputs.
- evaluate.py: Evaluates the generated outputs.

We support all models that are compatible with VLLM.

Example: Generating Outputs

To generate the code generations:

python generate.py --n_samples $n_samples --temperature $temperature --model $model --save_path $save_path

This command will create a .jsonl file with the generated outputs.

Complete Example: Generating with meta-llama/Llama-3.2-1B-Instruct, using VLLM as the backend on a GPU (with enough memory) using 1 sample and a temperature of 0 (greedy):

python generate.py --n_samples 1 --temperature 0 --greedy --model meta-llama/Llama-3.2-1B-Instruct --save_path generations/Llama-3.2-1B-Instruct_T=0.jsonl

Example: OpenAI-compatible serving

To generate code generations with an OpenAI-compatible server, run the following command replacing with your model and token.

vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123

To call the server, you can use the official OpenAI Python client library, or any other HTTP client (see https://docs.vllm.ai/en/v0.6.1/serving/openai_compatible_server.html and https://docs.vllm.ai/en/v0.6.1/serving/distributed_serving.html for multi-GPU serving).

Example: Running Evaluations

For standard evaluation:

python evaluate.py --json-out-file $json_outputs --output-path $out_dir --model-name $model_name --temperature $temperature

Parameter Descriptions:

--model-name: Name of the model used.
--json-out-file: Path to the generated outputs (e.g., generations/starcoder2-15b-instruct-v0.1_temperature0.0.jsonl).
--output-path: Directory to save the evaluation results.
--n-jobs: Number of parallel evaluation jobs (-1 uses all available CPUs).

Finishing the Example:

python evaluate.py --json-out-file generations/Llama-3.2-1B-Instruct_T=0.jsonl --model-name meta-llama/Llama-3.2-1B-Instruct --temperature 0.0

Full test:

bash tests/test_readme.sh

This will test the given README example to ensure that everything works as intended.

To test url serving:

bash tests/test_url.sh

To-Do Items

Specify the number of CPUs used in generation.

Supported Backends

Currently supported backend:

vllm
openai
anthropic
url-serving (openai-compatible)

Planned support:

openrouter

Docker

To build the Docker image, run make docker-build.

To open an interactive shell in a Docker container with a specific Python version, run make docker-run PYTHON_VERSION={desired version}. The following Python versions are configured to work: 3.7, 3.9, 3.10. The local working dir is mounted into the container in the dir /app/repo.

Converting CSV to JSONL

To convert a list of CSV file to JSONL format, use the script csv2jsonl.py. Example usage:

python csv2jsonl.py file1.csv file2.csv -o merged.jsonl

Run the env creation in the docker image

make docker-build
make docker-run

then you can run the eval env creation

cd repo 
pyenv shell 3.10.14
python -m venv --clear --copies eval_main_venv
source eval_main_venv/bin/activate
pip install -r requirements.txt

# create the virtual env
python src/create_venvs.py --dataset dataset/final_fix_dataset.jsonl --base_path eval_venvs

run verify dataset using pytest

To run dataset verification with pytest, eval_venvs must have been created first

python verify_dataset.py dataset/final_fix_dataset.jsonl eval_venvs dataset/solutions/tests

running on computecanada

we need to export the docker container :

  docker save gc:1.0 | gzip > gc_1.0.tar.gz

copy the tar file to compute canada

scp gc_1.0.tar.gz username@cedar.computecanada.ca:path

within a interactive job, build the container from the tar file by loading the apptainer module:

module load apptainer
apptainer build gc_1.0.sif docker-archive://gc_1.0.tar.gz

run the container to create the venvs :

apptainer exec \
  --bind "$PWD:/app/repo" \
  --env PYENV_VERSION=3.10.14 \
  --env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt \
  gc_1.0.sif \
  bash -lc "\
    cd /app/repo && \
    python -m venv --clear --copies eval_main_venv && \
    source eval_main_venv/bin/activate && \
    pip install -r requirements.txt && \
    python src/create_venvs.py --dataset dataset/final_fix_dataset.jsonl --base_path eval_venvs"

run the container to verify the dataset :

apptainer exec \
  --bind "$PWD:/app/repo" \
  --env PYENV_VERSION=3.10.14 \
  gc_1.0.sif \
  bash -lc "\
    cd /app/repo  && \
    source eval_main_venv/bin/activate
    python verify_dataset.py dataset/final_fix_dataset.jsonl eval_venvs dataset/solutions/tests"

run the parallel eval script (inside docker)

  python parallel_eval_jsonl.py \
    dataset/final_fix_dataset.jsonl \
    "${SOLUTIONS_JSONL_FILE}" \
    eval_venvs \
    dataset/solutions/tests \
    --wandb"

Name		Name	Last commit message	Last commit date
Latest commit History 492 Commits
.github/workflows		.github/workflows
agent_results		agent_results
compute_canada		compute_canada
dataset		dataset
eval_venvs		eval_venvs
scripts		scripts
scripts_plotting		scripts_plotting
src		src
tests		tests
wandb_results		wandb_results
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
all_eval_data.tar.gz		all_eval_data.tar.gz
analyse_extra_dep_use.py		analyse_extra_dep_use.py
anthropic_benchmarks.py		anthropic_benchmarks.py
compute_rag_interference.py		compute_rag_interference.py
configs.py		configs.py
evaluate.py		evaluate.py
extra_dep_use.csv		extra_dep_use.csv
generate.py		generate.py
gpt_benchmarks.py		gpt_benchmarks.py
gpt_debug_errors.json		gpt_debug_errors.json
gpt_errors.json		gpt_errors.json
parallel_eval_jsonl.py		parallel_eval_jsonl.py
rag_metrics.csv		rag_metrics.csv
requirements.txt		requirements.txt
responses_cot_gpt_4o_mini.jsonl		responses_cot_gpt_4o_mini.jsonl
responses_t0_gpt_4o_mini.jsonl		responses_t0_gpt_4o_mini.jsonl
run_anthropic_benchmarks.sh		run_anthropic_benchmarks.sh
run_gpt_benchmarks.sh		run_gpt_benchmarks.sh
self_debug_data.tar		self_debug_data.tar
verify_dataset.py		verify_dataset.py
verify_dataset.sh		verify_dataset.sh
verify_dataset_jsonl.py		verify_dataset_jsonl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NOTE: The project has moved to https://github.com/mrcabbage972/GitChameleonBenchmark. Please use that link for everything related to the GitChameleon benchmark.

[OUTDATED] GitChameleon: A Benchmark for Version-Conditioned Code Generation

Downloading the Dataset

Setting Up the Environment

Running Generations and Evaluations

Example: Generating Outputs

Example: OpenAI-compatible serving

Example: Running Evaluations

To-Do Items

Supported Backends

Docker

Converting CSV to JSONL

Run the env creation in the docker image

run verify dataset using pytest

running on computecanada

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

NizarIslah/GitChameleon

Folders and files

Latest commit

History

Repository files navigation

NOTE: The project has moved to https://github.com/mrcabbage972/GitChameleonBenchmark. Please use that link for everything related to the GitChameleon benchmark.

[OUTDATED] GitChameleon: A Benchmark for Version-Conditioned Code Generation

Downloading the Dataset

Setting Up the Environment

Running Generations and Evaluations

Example: Generating Outputs

Example: OpenAI-compatible serving

Example: Running Evaluations

To-Do Items

Supported Backends

Docker

Converting CSV to JSONL

Run the env creation in the docker image

run verify dataset using pytest

running on computecanada

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages