Description
Motivation
With the recent introduction of eval-callback
example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is the ability to dump everything inside a gguf file into a human-readable (and interactive) interface.
Inspired from huggingface.js
where users can visualize the KV and list of tensors on huggingface.com, I would like to implement the same thing in llama.cpp. I find this helpful in these situations:
- Debugging
convert.py
script when adding a new architecture - Debugging tokenizers
- Debugging changes related to gguf (model splits for example)
- Debugging tensors (i.e. display N first elements of a tensor, just like
eval-callback
) - Debugging control vectors
- ... (maybe other usages in the future)
The reason why I can't use huggingface.js
is because it's based on browser, which make it tricky when reading a huge local file. It also don't have access to quantized types (same for gguf-py
).
Possible Implementation
Ideally, I want the implementation to be a binary named gguf-viewer
that when run, will open a web page in localhost:8080
. User can then go to the web page to explore the gguf file. It will have these sections:
- Complete list of KV
- Tokenizer-related info (for example: list all tokens, lookup one token)
- List of all tensors