-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Closed
Labels
Description
Hi,
I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:
# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}
but it produces this error:
/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1662, in <module>
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
File "/content/llama.cpp/convert.py", line 1618, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "/content/llama.cpp/convert.py", line 1422, in load_vocab
vocab = SentencePieceVocab(
File "/content/llama.cpp/convert.py", line 449, in __init__
self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated.
One can replicate this experiment by quantizing a deepseek 7b instruct coder model.