Skip to content

convert : fix rwkv bos/eos token #13844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025
Merged

convert : fix rwkv bos/eos token #13844

merged 1 commit into from
May 30, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented May 28, 2025

SpecialVocab gets the token IDs from config.json, but they are overridden to 0 in hf_rwkv_tokenizer.py since they all share the same value (except eos_token, but that's incorrect and we use eot_token here).

cc/ @MollySophia

@CISC CISC requested a review from compilade May 28, 2025 07:46
@github-actions github-actions bot added the python python script changes label May 28, 2025
@MollySophia
Copy link
Collaborator

cc/ @zhiyuan1i

Copy link
Collaborator

@MollySophia MollySophia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since the RWKV base models also tend to sometimes output \x00 as eos token.

@CISC CISC merged commit db38704 into master May 30, 2025
7 checks passed
@CISC CISC deleted the cisc/fix-rwkv-bos-eos-token branch May 30, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants