Skip to content

Commit 48bf7e4

Browse files
Nathan Cooperlvwerrancoop57
authored
Code parrot minor fixes/niceties (huggingface#14666)
* Add some nicety flags for better controlling evaluation. * Fix dependency issue with outdated requirement * Add additional flag to example to ensure eval is done * Wrap code into main function for accelerate launcher to find * Fix valid batch size flag in readme * Add note to install git-lfs when initializing/training the model * Update examples/research_projects/codeparrot/scripts/arguments.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update examples/research_projects/codeparrot/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Revert "Wrap code into main function for accelerate launcher to find" This reverts commit ff11df1. * Fix formatting issue * Move git-lfs instructions to installation section * Add a quick check before code generation for code evaluation * Fix styling issue * Update examples/research_projects/codeparrot/scripts/human_eval.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Make iterable dataset use passed in tokenizer rather than globally defined one Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: ncoop57 <nac33@students.uwf.edu>
1 parent 91f3dfb commit 48bf7e4

File tree

5 files changed

+28
-6
lines changed

5 files changed

+28
-6
lines changed

examples/research_projects/codeparrot/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Before you run any of the scripts make sure you are logged in and can push to th
3131
huggingface-cli login
3232
```
3333

34+
Additionally, sure you have git-lfs installed. You can find instructions for how to install it [here](https://git-lfs.github.com/).
35+
3436
## Dataset
3537
The source of the dataset is the GitHub dump available on Google's [BigQuery](https://cloud.google.com/blog/topics/public-datasets/github-on-bigquery-analyze-all-the-open-source-code). The database was queried for all Python files with less than 1MB in size resulting in a 180GB dataset with over 20M files. The dataset is available on the Hugging Face Hub [here](https://huggingface.co/datasets/transformersbook/codeparrot).
3638

@@ -96,7 +98,7 @@ If you want to train the small model you need to make some modifications:
9698
accelerate launch scripts/codeparrot_training.py \
9799
--model_ckpt lvwerra/codeparrot-small \
98100
--train_batch_size 12 \
99-
--eval_batch_size 12 \
101+
--valid_batch_size 12 \
100102
--learning_rate 5e-4 \
101103
--num_warmup_steps 2000 \
102104
--gradient_accumulation 1 \
@@ -125,7 +127,8 @@ python scripts/human_eval.py --model_ckpt lvwerra/codeparrot \
125127
--do_sample True \
126128
--temperature 0.2 \
127129
--top_p 0.95 \
128-
--n_samples=200
130+
--n_samples=200 \
131+
--HF_ALLOW_CODE_EVAL="0"
129132
```
130133

131134
The results as well as reference values are shown in the following table:

examples/research_projects/codeparrot/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ accelerate==0.5.1
44
wandb==0.12.0
55
tensorboard==2.6.0
66
torch==1.9.0
7-
huggingface-hub==0.0.19
7+
huggingface-hub==0.1.0

examples/research_projects/codeparrot/scripts/arguments.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,10 @@ class HumanEvalArguments:
8383
metadata={"help": "Model name or path of model to be evaluated."},
8484
)
8585
num_workers: Optional[int] = field(default=None, metadata={"help": "Number of workers used for code evaluation."})
86+
num_tasks: Optional[int] = field(
87+
default=None,
88+
metadata={"help": "The number of human-eval tasks to run. If not included all tasks are evaluated."},
89+
)
8690
do_sample: Optional[bool] = field(
8791
default=True, metadata={"help": "Sample from the language model's output distribution."}
8892
)
@@ -101,6 +105,12 @@ class HumanEvalArguments:
101105
HF_ALLOW_CODE_EVAL: Optional[str] = field(
102106
default="0", metadata={"help": "Allow `code_eval` to execute Python code on machine"}
103107
)
108+
device_int: Optional[int] = field(
109+
default=-1,
110+
metadata={
111+
"help": "Determine which device to run the `text-generation` Pipeline on. -1 is CPU and any zero or positive number corresponds to which GPU device id to run on."
112+
},
113+
)
104114

105115

106116
@dataclass

examples/research_projects/codeparrot/scripts/codeparrot_training.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ def __iter__(self):
5959
else:
6060
more_examples = False
6161
break
62-
tokenized_inputs = tokenizer(buffer, truncation=False)["input_ids"]
62+
tokenized_inputs = self.tokenizer(buffer, truncation=False)["input_ids"]
6363
all_token_ids = []
6464
for tokenized_input in tokenized_inputs:
6565
all_token_ids.extend(tokenized_input + [self.concat_token_id])

examples/research_projects/codeparrot/scripts/human_eval.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,14 +51,23 @@ def main():
5151
# Load model and tokenizer
5252
tokenizer = AutoTokenizer.from_pretrained(args.model_ckpt)
5353
model = AutoModelForCausalLM.from_pretrained(args.model_ckpt)
54-
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=-1)
54+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=args.device_int)
5555

5656
# Load evaluation dataset and metric
5757
human_eval = load_dataset("openai_humaneval")
5858
code_eval_metric = load_metric("code_eval")
5959

60+
# Run a quick test to see if code evaluation is enabled
61+
try:
62+
_ = code_eval_metric.compute(references=[""], predictions=[[""]])
63+
except ValueError as exception:
64+
print(
65+
'Code evaluation not enabled. Read the warning below carefully and then use `--HF_ALLOW_CODE_EVAL="1"` flag to enable code evaluation.'
66+
)
67+
raise exception
68+
6069
# Generate completions for evaluation set
61-
n_tasks = 4 # len(human_eval["test"])
70+
n_tasks = args.num_tasks if args.num_tasks is not None else len(human_eval["test"])
6271
generations, references = [], []
6372
for task in tqdm(range(n_tasks)):
6473
task_generations = []

0 commit comments

Comments
 (0)