Skip to content

Conversation

gabe-l-hart
Copy link
Collaborator

Previously, llama-eval-callback's printed sum was only the sum of the values that got printed since the printing function compacted its view by jumping indices. This PR makes a double pass over each tensor, once iterating all values to compute the sum and once as before to create the printed view. This makes it much easier to compare between llama.cpp and transformers.

… printed values

This makes it much easier to compare between llama.cpp and transformers!

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@gabe-l-hart
Copy link
Collaborator Author

@ggerganov I'm assigning you on this since you gave a 👍 to my comment on the other PR, but feel free to forward

@gabe-l-hart gabe-l-hart merged commit a8bca68 into ggml-org:master Aug 28, 2025
48 checks passed
@gabe-l-hart gabe-l-hart deleted the gabe-l-hart/llama-eval-callback-sum branch August 28, 2025 20:27
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 29, 2025
… printed values (ggml-org#15637)

This makes it much easier to compare between llama.cpp and transformers!

https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409
Branch: gabe-l-hart/nvidia-nemotron-nano-15409

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants