Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 

Hi,

I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:
```
# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}
```

but it produces this error:
```
/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
  File "/content/llama.cpp/convert.py", line 1662, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/content/llama.cpp/convert.py", line 1618, in main
    vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
  File "/content/llama.cpp/convert.py", line 1422, in load_vocab
    vocab = SentencePieceVocab(
  File "/content/llama.cpp/convert.py", line 449, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
```

I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated.
One can replicate this experiment by quantizing a deepseek 7b instruct coder model. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions