Skip to content

Commit e79a0fa

Browse files
authored
Added missing code in exemplary notebook - custom datasets fine-tuning (huggingface#15300)
* Added missing code in exemplary notebook - custom datasets fine-tuning Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification. The missing code concerns adding labels for all but first token in a single word. The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb). * Changes requested in the review - keep the code as simple as possible
1 parent 0501beb commit e79a0fa

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

docs/source/custom_datasets.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,9 @@ def tokenize_and_align_labels(examples):
326326
label_ids.append(-100)
327327
elif word_idx != previous_word_idx: # Only label the first token of a given word.
328328
label_ids.append(label[word_idx])
329-
329+
else:
330+
label_ids.append(-100)
331+
previous_word_idx = word_idx
330332
labels.append(label_ids)
331333

332334
tokenized_inputs["labels"] = labels

0 commit comments

Comments
 (0)