-
Notifications
You must be signed in to change notification settings - Fork 6.5k
V1 Speech grpc streaming #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, all of the features in v1beta1 are GA now and available on the v1 endpoint. I'm not sure I remember the SpeechStub feature you're looking for, but we have an example of performing a streaming API request via Python that may be helpful. |
Thanks, however, i am looking for the Streaming Recognition sample that uses grpc and threaded streaming audio from mic. The old guide does not work since it references beta. |
Got it. I'll see if I can come up with an update that streams from the Mic. |
Thanks. I am confused between the google-cloud-spech-v1 approach and grpc-google-cloud-speech-v1 because the new grpc /google/cloud/speech/v1/cloud_speech_pb2.py is missing StreamingRecognize |
@gguuss Do you have any sample code to stream from microphone? The previous example using grpc isn't working. |
I don't have code for streaming from the Microphone yet. The only sample that currently does this is NodeJS. This is a priority for me but I probably won't be able to do it right away. |
@gguuss Thanks |
I've got some success with streaming audio from python's queue object. https://gist.github.com/datspike/3caabf60c04ffb726c319d70a59c4552 with MockStreamFile(some_queue) as audio_file:
self.audio_sample = self.speech_client.sample(
stream=audio_file,
encoding=speech.encoding.Encoding.LINEAR16,
sample_rate_hertz=16000) It will read data from python's queue as send it to the speech api without any problems. for alternative in alternatives:
pass ...loop, but then i will lose all the time when audio did not have any speech in it. |
Here is the previous sample code that works with beta. However, this code doesn't work with V1. #!/usr/bin/python # Copyright (C) 2016 Google Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Sample that streams audio to the Google Cloud Speech API via GRPC.""" from __future__ import division import contextlib import functools import re import signal import sys import google.auth import google.auth.transport.grpc import google.auth.transport.requests from google.cloud.grpc.speech.v1beta1 import cloud_speech_pb2 from google.rpc import code_pb2 import grpc import pyaudio from six.moves import queue # Audio recording parameters RATE = 16000 CHUNK = int(RATE / 10) # 100ms # The Speech API has a streaming limit of 60 seconds of audio*, so keep the # connection alive for that long, plus some more to give the API time to figure # out the transcription. # * https://g.co/cloud/speech/limits#content DEADLINE_SECS = 60 * 3 + 5 SPEECH_SCOPE = 'https://www.googleapis.com/auth/cloud-platform' def make_channel(host, port): """Creates a secure channel with auth credentials from the environment.""" # Grab application default credentials from the environment credentials, _ = google.auth.default(scopes=[SPEECH_SCOPE]) # Create a secure channel using the credentials. http_request = google.auth.transport.requests.Request() target = '{}:{}'.format(host, port) return google.auth.transport.grpc.secure_authorized_channel( credentials, http_request, target) def _audio_data_generator(buff): """A generator that yields all available data in the given buffer. Args: buff - a Queue object, where each element is a chunk of data. Yields: A chunk of data that is the aggregate of all chunks of data in `buff`. The function will block until at least one data chunk is available. """ stop = False while not stop: # Use a blocking get() to ensure there's at least one chunk of data. data = [buff.get()] # Now consume whatever other data's still buffered. while True: try: data.append(buff.get(block=False)) except queue.Empty: break # `None` in the buffer signals that the audio stream is closed. Yield # the final bit of the buffer and exit the loop. if None in data: stop = True data.remove(None) yield b''.join(data) def _fill_buffer(buff, in_data, frame_count, time_info, status_flags): """Continuously collect data from the audio stream, into the buffer.""" buff.put(in_data) return None, pyaudio.paContinue # [START audio_stream] @contextlib.contextmanager def record_audio(rate, chunk): """Opens a recording stream in a context manager.""" # Create a thread-safe buffer of audio data buff = queue.Queue() audio_interface = pyaudio.PyAudio() audio_stream = audio_interface.open( format=pyaudio.paInt16, # The API currently only supports 1-channel (mono) audio # https://goo.gl/z757pE channels=1, rate=rate, input=True, frames_per_buffer=chunk, # Run the audio stream asynchronously to fill the buffer object. # This is necessary so that the input device's buffer doesn't overflow # while the calling thread makes network requests, etc. stream_callback=functools.partial(_fill_buffer, buff), ) yield _audio_data_generator(buff) audio_stream.stop_stream() audio_stream.close() # Signal the _audio_data_generator to finish buff.put(None) audio_interface.terminate() # [END audio_stream] def request_stream(data_stream, rate, interim_results=True): """Yields `StreamingRecognizeRequest`s constructed from a recording audio stream. Args: data_stream: A generator that yields raw audio data to send. rate: The sampling rate in hertz. interim_results: Whether to return intermediate results, before the transcription is finalized. """ # The initial request must contain metadata about the stream, so the # server knows how to interpret it. recognition_config = cloud_speech_pb2.RecognitionConfig( # There are a bunch of config options you can specify. See # https://goo.gl/KPZn97 for the full list. encoding='LINEAR16', # raw 16-bit signed LE samples sample_rate=rate, # the rate in hertz # See http://g.co/cloud/speech/docs/languages # for a list of supported languages. language_code='en-US', # a BCP-47 language tag ) streaming_config = cloud_speech_pb2.StreamingRecognitionConfig( interim_results=interim_results, config=recognition_config, ) yield cloud_speech_pb2.StreamingRecognizeRequest( streaming_config=streaming_config) for data in data_stream: # Subsequent requests can all just have the content yield cloud_speech_pb2.StreamingRecognizeRequest(audio_content=data) def listen_print_loop(recognize_stream): """Iterates through server responses and prints them. The recognize_stream passed is a generator that will block until a response is provided by the server. When the transcription response comes, print it. In this case, responses are provided for interim results as well. If the response is an interim one, print a line feed at the end of it, to allow the next result to overwrite it, until the response is a final one. For the final one, print a newline to preserve the finalized transcription. """ num_chars_printed = 0 for resp in recognize_stream: if resp.error.code != code_pb2.OK: raise RuntimeError('Server error: ' + resp.error.message) if not resp.results: continue # Display the top transcription result = resp.results[0] transcript = result.alternatives[0].transcript # Display interim results, but with a carriage return at the end of the # line, so subsequent lines will overwrite them. # # If the previous result was longer than this one, we need to print # some extra spaces to overwrite the previous result overwrite_chars = ' ' * max(0, num_chars_printed - len(transcript)) if not result.is_final: sys.stdout.write(transcript + overwrite_chars + '\r') sys.stdout.flush() num_chars_printed = len(transcript) else: print(transcript + overwrite_chars) # Exit recognition if any of the transcribed phrases could be # one of our keywords. if re.search(r'\b(exit|quit)\b', transcript, re.I): print('Exiting..') break num_chars_printed = 0 def main(): service = cloud_speech_pb2.SpeechStub( make_channel('speech.googleapis.com', 443)) # For streaming audio from the microphone, there are three threads. # First, a thread that collects audio data as it comes in with record_audio(RATE, CHUNK) as buffered_audio_data: # Second, a thread that sends requests with that data requests = request_stream(buffered_audio_data, RATE) # Third, a thread that listens for transcription responses recognize_stream = service.StreamingRecognize( requests, DEADLINE_SECS) # Exit things cleanly on interrupt signal.signal(signal.SIGINT, lambda *_: recognize_stream.cancel()) # Now, put the transcription responses to use. try: listen_print_loop(recognize_stream) recognize_stream.cancel() except grpc.RpcError as e: code = e.code() # CANCELLED is caused by the interrupt handler, which is expected. if code is not code.CANCELLED: raise if __name__ == '__main__': main() |
I'm having the same problem, I would like to understand why the grpc examples have been removed, will new examples be published? It was very helpful since I'm not that pro with python and this stuff is new for me.. Thanks |
Similarly interested in a new python streaming microphone example, if you have any idea when you will have time to get to it? |
Made a first pass at adapting the grpc sample to use the google-cloud client lib. It probably doesn't handle error cases as well, and it'll throw an error once it hits the 60-second API deadline, but it should at least give you guys a starting point. The new client lib simplifies a bit (don't have to manually manage as many threads), and complicates a bit (need to conform to a file interface), but it's a decent tradeoff IMO. Let me know what you think! Pull requests welcome, as always :-) |
@jerjou for president! |
Oh, keep in mind you have to install the Anyway, you should be able to get it to work by just running |
Thank you so much, I'm going to test it out these days.
Il 16 giu 2017 12:49 AM, "Jerjou" <notifications@github.com> ha scritto:
… Made a first pass
<https://github.com/GoogleCloudPlatform/python-docs-samples/blob/speech-mic/speech/cloud-client/transcribe_streaming_mic.py>
at adapting the grpc sample to use the google-cloud client lib. It probably
doesn't handle error cases as well, and it'll throw an error once it hits
the 60-second API deadline, but it should at least give you guys a starting
point.
The new client lib simplifies a bit (don't have to manually manage as many
threads), and complicates a bit (need to conform to a file interface), but
it's a decent tradeoff IMO.
Let me know what you think! Pull requests welcome, as always :-)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#932 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXvn0mmofMP1yiNtCkQByPZjIf6PIBpEks5sEajQgaJpZM4NSTVk>
.
|
@datspike unfortunately, if I understand your question correctly, this isn't something that you can reliably do with the API at the moment (the API, for instance, may take longer to transcribe some utterances than others, or may have to retry audio behind the scenes, increasing response latency). ...stay tuned, though ;-) |
@jerjou thanks for the reply. |
While I'm here - any interest in my porting the sample that attempts to work around the 1-minute time limit for streaming requests? |
@jerjou Using your code, I'm having an issue when running
After I receive the final utterance, I set
However, the recognition doesn't exit, and instead crashes after around a minute (presumably after exceeding the maximum time limit). To make the recognition exit, I have to stop the pyaudio stream:
Unfortunately, since I'm sharing the audio stream, I can't stop it without interfering in other parts of my program. Is there a way to stop speech recognition without closing the input stream? An easy way to reproduce this is to grab transcribe_streaming_mic.py and add
The desired behavior is for the program to exit after returning the single utterance. The current behavior is that the program continues running until around a minute later, until it crashes (presumably due to the 60 second limit). |
So I think you have two issues, which have separate answers: stream.closed = True doesn't stop the streamIt works for me. Did you add the
|
I did add the Thanks for all your help! The sample code was a lifesaver ❤️ |
Oh huh - that's unfortunate. It sounds like you're okay for now, but if you'd like to pursue the issue I'm happy to pursue it with you :-) Maybe open a new issue, though, since it's a bit tangential to this one. It'd also be helpful to include the |
@jerjou @thecodingwizard Hi, does the code above and at
|
Hi All, I noticed that previous example of grpc speech streaming for v1beta1 is gone now. Will it be updated for v1? I tried to replace the old library with google.cloud.speech.v1 but there looks to be no support for SpeechStub
The text was updated successfully, but these errors were encountered: