Skip to content

Commit fc49ee4

Browse files
ruby : support new-segment callback (ggml-org#2506)
* Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit 71b65b0. * Revert "Add tests for Params#new_segment_callback=" This reverts commit 81e6df3. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore
1 parent c0ea41f commit fc49ee4

File tree

14 files changed

+1112
-170
lines changed

14 files changed

+1112
-170
lines changed

.github/workflows/bindings-ruby.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ on:
1616
- ggml/src/ggml-quants.h
1717
- ggml/src/ggml-quants.c
1818
- ggml/src/ggml-cpu-impl.h
19+
- ggml/src/ggml-metal.m
20+
- ggml/src/ggml-metal.metal
21+
- ggml/src/ggml-blas.cpp
1922
- ggml/include/ggml.h
2023
- ggml/include/ggml-alloc.h
2124
- ggml/include/ggml-backend.h
@@ -24,6 +27,8 @@ on:
2427
- ggml/include/ggml-metal.h
2528
- ggml/include/ggml-sycl.h
2629
- ggml/include/ggml-vulkan.h
30+
- ggml/include/ggml-blas.h
31+
- scripts/get-flags.mk
2732
- examples/dr_wav.h
2833
pull_request:
2934
paths:
@@ -41,6 +46,9 @@ on:
4146
- ggml/src/ggml-quants.h
4247
- ggml/src/ggml-quants.c
4348
- ggml/src/ggml-cpu-impl.h
49+
- ggml/src/ggml-metal.m
50+
- ggml/src/ggml-metal.metal
51+
- ggml/src/ggml-blas.cpp
4452
- ggml/include/ggml.h
4553
- ggml/include/ggml-alloc.h
4654
- ggml/include/ggml-backend.h
@@ -49,6 +57,8 @@ on:
4957
- ggml/include/ggml-metal.h
5058
- ggml/include/ggml-sycl.h
5159
- ggml/include/ggml-vulkan.h
60+
- ggml/include/ggml-blas.h
61+
- scripts/get-flags.mk
5262
- examples/dr_wav.h
5363

5464
jobs:

bindings/ruby/.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
README.md
21
LICENSE
32
pkg/
43
lib/whisper.*

bindings/ruby/README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
whispercpp
2+
==========
3+
4+
![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)
5+
6+
Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.
7+
8+
Installation
9+
------------
10+
11+
Install the gem and add to the application's Gemfile by executing:
12+
13+
$ bundle add whispercpp
14+
15+
If bundler is not being used to manage dependencies, install the gem by executing:
16+
17+
$ gem install whispercpp
18+
19+
Usage
20+
-----
21+
22+
```ruby
23+
require "whisper"
24+
25+
whisper = Whisper::Context.new("path/to/model.bin")
26+
27+
params = Whisper::Params.new
28+
params.language = "en"
29+
params.offset = 10_000
30+
params.duration = 60_000
31+
params.max_text_tokens = 300
32+
params.translate = true
33+
params.print_timestamps = false
34+
35+
whisper.transcribe("path/to/audio.wav", params) do |whole_text|
36+
puts whole_text
37+
end
38+
39+
```
40+
41+
### Preparing model ###
42+
43+
Use script to download model file(s):
44+
45+
```bash
46+
git clone https://github.com/ggerganov/whisper.cpp.git
47+
cd whisper.cpp
48+
sh ./models/download-ggml-model.sh base.en
49+
```
50+
51+
There are some types of models. See [models][] page for details.
52+
53+
### Preparing audio file ###
54+
55+
Currently, whisper.cpp accepts only 16-bit WAV files.
56+
57+
### API ###
58+
59+
Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
60+
61+
```ruby
62+
def format_time(time_ms)
63+
sec, decimal_part = time_ms.divmod(1000)
64+
min, sec = sec.divmod(60)
65+
hour, min = min.divmod(60)
66+
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
67+
end
68+
69+
whisper.transcribe("path/to/audio.wav", params)
70+
71+
whisper.each_segment.with_index do |segment, index|
72+
line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
73+
nth: index + 1,
74+
st: format_time(segment.start_time),
75+
ed: format_time(segment.end_time),
76+
text: segment.text
77+
}
78+
line << " (speaker turned)" if segment.speaker_next_turn?
79+
puts line
80+
end
81+
82+
```
83+
84+
You can also add hook to params called on new segment:
85+
86+
```ruby
87+
def format_time(time_ms)
88+
sec, decimal_part = time_ms.divmod(1000)
89+
min, sec = sec.divmod(60)
90+
hour, min = min.divmod(60)
91+
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
92+
end
93+
94+
# Add hook before calling #transcribe
95+
params.on_new_segment do |segment|
96+
line = "[%{st} --> %{ed}] %{text}" % {
97+
st: format_time(segment.start_time),
98+
ed: format_time(segment.end_time),
99+
text: segment.text
100+
}
101+
line << " (speaker turned)" if segment.speaker_next_turn?
102+
puts line
103+
end
104+
105+
whisper.transcribe("path/to/audio.wav", params)
106+
107+
```
108+
109+
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
110+
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models

bindings/ruby/Rakefile

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,16 @@ require "yaml"
55
require "rake/testtask"
66

77
extsources = YAML.load_file("extsources.yaml")
8-
extsources.each_pair do |src_dir, dests|
9-
dests.each do |dest|
10-
src = Pathname(src_dir)/File.basename(dest)
11-
12-
file src
13-
file dest => src do |t|
14-
cp t.source, t.name
15-
end
8+
SOURCES = FileList[]
9+
extsources.each do |src|
10+
basename = src.pathmap("%f")
11+
dest = basename == "LICENSE" ? basename : basename.pathmap("ext/%f")
12+
file src
13+
file dest => src do |t|
14+
cp t.source, t.name
1615
end
16+
SOURCES.include dest
1717
end
18-
SOURCES = extsources.values.flatten
1918
CLEAN.include SOURCES
2019
CLEAN.include FileList[
2120
"ext/*.o",

bindings/ruby/ext/.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ ggml-backend.c
1111
ggml-backend.h
1212
ggml-common.h
1313
ggml-cpu-impl.h
14+
ggml-metal.m
15+
ggml-metal.metal
16+
ggml-metal-embed.metal
17+
ggml-blas.cpp
1418
ggml-cuda.h
1519
ggml-impl.h
1620
ggml-kompute.h
@@ -20,9 +24,12 @@ ggml-quants.c
2024
ggml-quants.h
2125
ggml-sycl.h
2226
ggml-vulkan.h
27+
ggml-blas.h
28+
get-flags.mk
2329
whisper.cpp
2430
whisper.h
2531
dr_wav.h
32+
depend
2633
whisper.bundle
2734
whisper.so
2835
whisper.dll

0 commit comments

Comments
 (0)