Skip to content

Commit e73a3e1

Browse files
authored
Add note to resize token embeddings matrix when adding new tokens to voc (huggingface#10331)
1 parent 19e737b commit e73a3e1

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

src/transformers/tokenization_utils_base.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -971,6 +971,12 @@ def add_tokens(
971971
Add a list of new tokens to the tokenizer class. If the new tokens are not in the vocabulary, they are added to
972972
it with indices starting from length of the current vocabulary.
973973
974+
.. Note::
975+
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
976+
the model so that its embedding matrix matches the tokenizer.
977+
978+
In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
979+
974980
Args:
975981
new_tokens (:obj:`str`, :obj:`tokenizers.AddedToken` or a list of `str` or :obj:`tokenizers.AddedToken`):
976982
Tokens are only added if they are not already in the vocabulary. :obj:`tokenizers.AddedToken` wraps a

0 commit comments

Comments
 (0)