sync : whisper.cpp #3239

ggerganov · 2025-06-10T07:13:13Z

No description provided.

Enable uniform linking with subproject and with find_package.

* gguf: prevent non-native endian models from being loaded Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * gguf: update error message Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * gguf: make the non-native endian check more verbose Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_assert location Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: reword the endianness check error message Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

…(llama/13826) * [WIP]: fuse q8 quantization and reorder * wip2: fuse q8 quantization and reorder * working q8 reorder commit * restored common.hpp * remove debug prints * remove unnecessary headers and remove trailing whitespace * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>

…ma/13966) Some systems report the CPU implementation as "Power11" instead of "POWER11". The existing CMake logic uses a case-sensitive regular expression to extract the CPU generation, which fails when the casing doesn't exactly match "POWER". This patch provides a fix by first converting the string to uppercase before applying the regex. Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com> Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>

ggml-ci

* add concat, pad, repeat, tsembd, tanh, upscale * small fixes

* This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.

…se (llama/13996)

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D * test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D * Missing barrier added to shader. Number of additional tests reduced to 108. * * Fixes typo in variable name. * Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings. * Problem size reduced in tests to pass tests with llvmpipe. * supports_op condition moved from unintended position

…N_VER to llama.cpp sources (llama/14013)

… (llama/14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check

* SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

* Add Reorder to Q6_K mmvq implementation * Address PR comments: clean up comments * Remove unused parameter after refactoring q4_k * Adding inline to function and removing unnecessary reference to int --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

* Simplify the environment variable setting to specify the memory pool type. * Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options. * update * fix CI * update * delete whitespace * fix according to review * update CANN.md * update CANN.md

* move ggml-cpu-aarch64 to repack * split quantize_row_q8_0/1 * split helper functions * split ggml_vec_dot_q4_0_q8_0 * split ggml_vec_dot_q4_1_q8_1 * split ggml_vec_dot_q5_0_q8_0 * split ggml_vec_dot_q5_1_q8_1 * split ggml_vec_dot_q8_0_q8_0 * split ggml_vec_dot_tq1_0_q8_K * split ggml_vec_dot_tq2_0_q8_K * split ggml_vec_dot_q2_K_q8_K * split ggml_vec_dot_q3_K_q8_K * split ggml_vec_dot_q4_K_q8_K * split ggml_vec_dot_q5_K_q8_K * split ggml_vec_dot_q6_K_q8_K * split ggml_vec_dot_iq2_xxs_q8_K * split ggml_vec_dot_iq2_xs_q8_K * split ggml_vec_dot_iq2_s_q8_K * split ggml_vec_dot_iq3_xxs_q8_K * split ggml_vec_dot_iq3_s_q8_K * split ggml_vec_dot_iq1_s_q8_K * split ggml_vec_dot_iq1_m_q8_K * split ggml_vec_dot_iq4_nl_q8_0 * split ggml_vec_dot_iq4_xs_q8_K * fix typos * fix missing prototypes * rename ggml-cpu-quants.c * rename ggml-cpu-traits * rename arm folder * move cpu-feats-x86.cpp * rename ggml-cpu-hbm * update arm detection macro in quants.c * move iq quant tables * split ggml_quantize_mat_q8_0/K * split ggml_gemv_* * split ggml_gemm_* * rename namespace aarch64 to repack * use weak aliases to replace test macros * rename GGML_CPU_AARCH64 to GGML_CPU_REPACK * rename more aarch64 to repack * clean up rebase leftover * fix compilation errors * remove trailing spaces * try to fix clang compilation errors * try to fix clang compilation errors again * try to fix clang compilation errors, 3rd attempt * try to fix clang compilation errors, 4th attempt * try to fix clang compilation errors, 5th attempt * try to fix clang compilation errors, 6th attempt * try to fix clang compilation errors, 7th attempt * try to fix clang compilation errors, 8th attempt * try to fix clang compilation errors, 9th attempt * more cleanup * fix compilation errors * fix apple targets * fix a typo in arm version of ggml_vec_dot_q4_K_q8_K Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* metal : use less stack memory in FA kernel ggml-ci * cont : fix BF16 variant

ggml-ci

ggerganov · 2025-06-10T08:25:38Z

@xctan Could you recommend a fix for this Windows build:

https://github.com/ggml-org/whisper.cpp/actions/runs/15554099989/job/43790646211?pr=3239#step:6:170

xctan · 2025-06-10T08:40:00Z

@xctan Could you recommend a fix for this Windows build:

https://github.com/ggml-org/whisper.cpp/actions/runs/15554099989/job/43790646211?pr=3239#step:6:170

Because apple targets don't have the weak alias feature, all quant kernels are implemented in x86 and arm architectures to avoid linking issues. Thus, for MSVC builds on Windows, weak aliases are not necessary. Maybe we should only enable this macro on other architectures?

xctan · 2025-06-10T08:44:27Z

For the __cdecl calling convention of 32-bit x86, a leading underscore is prefixed to the symbol name (MSVC doc).

ggerganov · 2025-06-10T09:33:29Z

Adding the underscores seem to resolve the build.

Thus, for MSVC builds on Windows, weak aliases are not necessary. Maybe we should only enable this macro on other architectures?

I'm not sure - whatever you think is better, let's do that.

ggml-ci

xctan · 2025-06-10T10:00:09Z

all quant kernels are implemented in x86 and arm

Not true for kernels in repack.cpp. I think we should keep the current weak alias implementation after examining the failing CI logs.

dg0yt and others added 24 commits June 10, 2025 10:11

Add in-build ggml::ggml ALIAS library (ggml/1260)

591ff6d

Enable uniform linking with subproject and with find_package.

gguf: fix failure on version == 0 (llama/13956)

85e68b0

metal : use F32 accumulators in FA kernels (llama/13975)

4aff5fc

ggml-ci

OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (llama/13840)

8c13d61

* add concat, pad, repeat, tsembd, tanh, upscale * small fixes

opencl: add backend_synchronize (llama/13939)

8d772f5

* This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.

vulkan: fix warnings in perf logger querypool code (llama/13937)

eb0918a

CUDA: fix FTZ in FA for Gemma 3 (llama/13991)

b2f6786

releases : use dl backend for linux release, remove arm64 linux relea…

34d141e

…se (llama/13996)

vulkan: automatically deduce size of push constants (llama/13936)

a419054

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…

e7291cb

…N_VER to llama.cpp sources (llama/14013)

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs…

974e319

… (llama/14001) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check

cuda : fix buffer type check with integrated GPUs (llama/14069)

b93c747

cuda : fix device sync on buffer clear (llama/14033)

59729dd

metal : use less stack memory in FA kernel (llama/14088)

b582b6b

* metal : use less stack memory in FA kernel ggml-ci * cont : fix BF16 variant

sync : ggml

3ae0303

ggml-ci

talk-llama : sync llama.cpp

2705e98

ggml-ci

danbev approved these changes Jun 10, 2025

View reviewed changes

ggerganov added 5 commits June 10, 2025 10:58

files : remove old sources

79b3c0d

sync : ggml

f7b7093

ggml-ci

files : remove old sources (part 2)

66eaa61

sync : ggml

0f99fa1

ggml-ci

android : fix builds (#0)

382d421

ggml-ci

ggerganov force-pushed the sync-whisper.cpp-25-06-10 branch 2 times, most recently from 5405127 to 47e060e Compare June 10, 2025 08:51

ggml : fix weak alias win32 (#0)

76b5a7b

ggml-ci

ggerganov force-pushed the sync-whisper.cpp-25-06-10 branch from b3dc016 to 76b5a7b Compare June 10, 2025 09:40

ggerganov merged commit 93d5439 into master Jun 10, 2025
3 of 8 checks passed

ggerganov deleted the sync-whisper.cpp-25-06-10 branch June 10, 2025 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : whisper.cpp #3239

sync : whisper.cpp #3239

Uh oh!

ggerganov commented Jun 10, 2025

Uh oh!

ggerganov commented Jun 10, 2025

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

ggerganov commented Jun 10, 2025

Uh oh!

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

Uh oh!

sync : whisper.cpp #3239

sync : whisper.cpp #3239

Uh oh!

Conversation

ggerganov commented Jun 10, 2025

Uh oh!

ggerganov commented Jun 10, 2025

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

ggerganov commented Jun 10, 2025

Uh oh!

Uh oh!

xctan commented Jun 10, 2025

Uh oh!

Uh oh!