Skip to content

V1 Speech grpc streaming #932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
V3SalC opened this issue May 5, 2017 · 24 comments
Closed

V1 Speech grpc streaming #932

V3SalC opened this issue May 5, 2017 · 24 comments
Assignees

Comments

@V3SalC
Copy link

V3SalC commented May 5, 2017

Hi All, I noticed that previous example of grpc speech streaming for v1beta1 is gone now. Will it be updated for v1? I tried to replace the old library with google.cloud.speech.v1 but there looks to be no support for SpeechStub

@gguuss
Copy link
Contributor

gguuss commented May 5, 2017

Yes, all of the features in v1beta1 are GA now and available on the v1 endpoint. I'm not sure I remember the SpeechStub feature you're looking for, but we have an example of performing a streaming API request via Python that may be helpful.

@V3SalC
Copy link
Author

V3SalC commented May 8, 2017

Thanks, however, i am looking for the Streaming Recognition sample that uses grpc and threaded streaming audio from mic. The old guide does not work since it references beta.

@gguuss
Copy link
Contributor

gguuss commented May 8, 2017

Got it. I'll see if I can come up with an update that streams from the Mic.

@V3SalC
Copy link
Author

V3SalC commented May 9, 2017

Thanks. I am confused between the google-cloud-spech-v1 approach and grpc-google-cloud-speech-v1 because the new grpc /google/cloud/speech/v1/cloud_speech_pb2.py is missing StreamingRecognize

@tcsj
Copy link

tcsj commented May 12, 2017

@gguuss Do you have any sample code to stream from microphone? The previous example using grpc isn't working.

@gguuss
Copy link
Contributor

gguuss commented May 12, 2017

I don't have code for streaming from the Microphone yet. The only sample that currently does this is NodeJS. This is a priority for me but I probably won't be able to do it right away.

@tcsj
Copy link

tcsj commented May 13, 2017

@gguuss Thanks

@datspike
Copy link

datspike commented May 20, 2017

I've got some success with streaming audio from python's queue object. https://gist.github.com/datspike/3caabf60c04ffb726c319d70a59c4552
If you can dump audio from the mic into some queue.Queue() object and pass that to this class, and then use it like that:

with MockStreamFile(some_queue) as audio_file:
    self.audio_sample = self.speech_client.sample(
        stream=audio_file,
        encoding=speech.encoding.Encoding.LINEAR16,
        sample_rate_hertz=16000)

It will read data from python's queue as send it to the speech api without any problems.
I have a question, how can I get current streaming recognition state? I'm trying to get more or less correct timestamps from recognition results and I need some reference point aka at which time API has started to recognize the speech. Obviously I can get it from the

for alternative in alternatives:
    pass

...loop, but then i will lose all the time when audio did not have any speech in it.
Second path is to save time before this loop, but in my testing it's a bit not stable in terms of correct timestamps from time to time.

@V3SalC
Copy link
Author

V3SalC commented May 23, 2017

Here is the previous sample code that works with beta. However, this code doesn't work with V1.

#!/usr/bin/python
# Copyright (C) 2016 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Sample that streams audio to the Google Cloud Speech API via GRPC."""

from __future__ import division

import contextlib
import functools
import re
import signal
import sys


import google.auth
import google.auth.transport.grpc
import google.auth.transport.requests
from google.cloud.grpc.speech.v1beta1 import cloud_speech_pb2
from google.rpc import code_pb2
import grpc
import pyaudio
from six.moves import queue

# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10)  # 100ms

# The Speech API has a streaming limit of 60 seconds of audio*, so keep the
# connection alive for that long, plus some more to give the API time to figure
# out the transcription.
# * https://g.co/cloud/speech/limits#content
DEADLINE_SECS = 60 * 3 + 5
SPEECH_SCOPE = 'https://www.googleapis.com/auth/cloud-platform'


def make_channel(host, port):
    """Creates a secure channel with auth credentials from the environment."""
    # Grab application default credentials from the environment
    credentials, _ = google.auth.default(scopes=[SPEECH_SCOPE])

    # Create a secure channel using the credentials.
    http_request = google.auth.transport.requests.Request()
    target = '{}:{}'.format(host, port)

    return google.auth.transport.grpc.secure_authorized_channel(
        credentials, http_request, target)


def _audio_data_generator(buff):
    """A generator that yields all available data in the given buffer.

    Args:
        buff - a Queue object, where each element is a chunk of data.
    Yields:
        A chunk of data that is the aggregate of all chunks of data in `buff`.
        The function will block until at least one data chunk is available.
    """
    stop = False
    while not stop:
        # Use a blocking get() to ensure there's at least one chunk of data.
        data = [buff.get()]

        # Now consume whatever other data's still buffered.
        while True:
            try:
                data.append(buff.get(block=False))
            except queue.Empty:
                break

        # `None` in the buffer signals that the audio stream is closed. Yield
        # the final bit of the buffer and exit the loop.
        if None in data:
            stop = True
            data.remove(None)

        yield b''.join(data)


def _fill_buffer(buff, in_data, frame_count, time_info, status_flags):
    """Continuously collect data from the audio stream, into the buffer."""
    buff.put(in_data)
    return None, pyaudio.paContinue


# [START audio_stream]
@contextlib.contextmanager
def record_audio(rate, chunk):
    """Opens a recording stream in a context manager."""
    # Create a thread-safe buffer of audio data
    buff = queue.Queue()

    audio_interface = pyaudio.PyAudio()
    audio_stream = audio_interface.open(
        format=pyaudio.paInt16,
        # The API currently only supports 1-channel (mono) audio
        # https://goo.gl/z757pE
        channels=1, rate=rate,
        input=True, frames_per_buffer=chunk,
        # Run the audio stream asynchronously to fill the buffer object.
        # This is necessary so that the input device's buffer doesn't overflow
        # while the calling thread makes network requests, etc.
        stream_callback=functools.partial(_fill_buffer, buff),
    )

    yield _audio_data_generator(buff)

    audio_stream.stop_stream()
    audio_stream.close()
    # Signal the _audio_data_generator to finish
    buff.put(None)
    audio_interface.terminate()
# [END audio_stream]


def request_stream(data_stream, rate, interim_results=True):
    """Yields `StreamingRecognizeRequest`s constructed from a recording audio
    stream.

    Args:
        data_stream: A generator that yields raw audio data to send.
        rate: The sampling rate in hertz.
        interim_results: Whether to return intermediate results, before the
            transcription is finalized.
    """
    # The initial request must contain metadata about the stream, so the
    # server knows how to interpret it.
    recognition_config = cloud_speech_pb2.RecognitionConfig(
        # There are a bunch of config options you can specify. See
        # https://goo.gl/KPZn97 for the full list.
        encoding='LINEAR16',  # raw 16-bit signed LE samples
        sample_rate=rate,  # the rate in hertz
        # See http://g.co/cloud/speech/docs/languages
        # for a list of supported languages.
        language_code='en-US',  # a BCP-47 language tag
    )
    streaming_config = cloud_speech_pb2.StreamingRecognitionConfig(
        interim_results=interim_results,
        config=recognition_config,
    )

    yield cloud_speech_pb2.StreamingRecognizeRequest(
        streaming_config=streaming_config)

    for data in data_stream:
        # Subsequent requests can all just have the content
        yield cloud_speech_pb2.StreamingRecognizeRequest(audio_content=data)


def listen_print_loop(recognize_stream):
    """Iterates through server responses and prints them.

    The recognize_stream passed is a generator that will block until a response
    is provided by the server. When the transcription response comes, print it.

    In this case, responses are provided for interim results as well. If the
    response is an interim one, print a line feed at the end of it, to allow
    the next result to overwrite it, until the response is a final one. For the
    final one, print a newline to preserve the finalized transcription.
    """
    num_chars_printed = 0
    for resp in recognize_stream:
        if resp.error.code != code_pb2.OK:
            raise RuntimeError('Server error: ' + resp.error.message)

        if not resp.results:
            continue

        # Display the top transcription
        result = resp.results[0]
        transcript = result.alternatives[0].transcript

        # Display interim results, but with a carriage return at the end of the
        # line, so subsequent lines will overwrite them.
        #
        # If the previous result was longer than this one, we need to print
        # some extra spaces to overwrite the previous result
        overwrite_chars = ' ' * max(0, num_chars_printed - len(transcript))

        if not result.is_final:
            sys.stdout.write(transcript + overwrite_chars + '\r')
            sys.stdout.flush()

            num_chars_printed = len(transcript)

        else:
            print(transcript + overwrite_chars)

            # Exit recognition if any of the transcribed phrases could be
            # one of our keywords.
            if re.search(r'\b(exit|quit)\b', transcript, re.I):
                print('Exiting..')
                break

            num_chars_printed = 0


def main():
    service = cloud_speech_pb2.SpeechStub(
        make_channel('speech.googleapis.com', 443))

    # For streaming audio from the microphone, there are three threads.
    # First, a thread that collects audio data as it comes in
    with record_audio(RATE, CHUNK) as buffered_audio_data:
        # Second, a thread that sends requests with that data
        requests = request_stream(buffered_audio_data, RATE)
        # Third, a thread that listens for transcription responses
        recognize_stream = service.StreamingRecognize(
            requests, DEADLINE_SECS)

        # Exit things cleanly on interrupt
        signal.signal(signal.SIGINT, lambda *_: recognize_stream.cancel())

        # Now, put the transcription responses to use.
        try:
            listen_print_loop(recognize_stream)

            recognize_stream.cancel()
        except grpc.RpcError as e:
            code = e.code()
            # CANCELLED is caused by the interrupt handler, which is expected.
            if code is not code.CANCELLED:
                raise


if __name__ == '__main__':
    main()

@JuanCruz98
Copy link

I'm having the same problem, I would like to understand why the grpc examples have been removed, will new examples be published? It was very helpful since I'm not that pro with python and this stuff is new for me.. Thanks

@msdejong
Copy link

msdejong commented Jun 7, 2017

Similarly interested in a new python streaming microphone example, if you have any idea when you will have time to get to it?

@gguuss
Copy link
Contributor

gguuss commented Jun 15, 2017

No idea how long it will take me, there is an example in Node. The GRPC version took @jerjou a lot of time to get right, there's a chance he may be able to help.

@jerjou
Copy link
Contributor

jerjou commented Jun 15, 2017

Made a first pass at adapting the grpc sample to use the google-cloud client lib. It probably doesn't handle error cases as well, and it'll throw an error once it hits the 60-second API deadline, but it should at least give you guys a starting point.

The new client lib simplifies a bit (don't have to manually manage as many threads), and complicates a bit (need to conform to a file interface), but it's a decent tradeoff IMO.

Let me know what you think! Pull requests welcome, as always :-)

@gguuss
Copy link
Contributor

gguuss commented Jun 15, 2017

@jerjou for president!

@jerjou
Copy link
Contributor

jerjou commented Jun 15, 2017

Oh, keep in mind you have to install the pyaudio dep. I didn't add it to the requirements.txt since it's not required by any of the other samples. Not sure what the Right Thing to do with that is.

Anyway, you should be able to get it to work by just running pip install pyaudio when in your virtualenv (modulo the normal issues installing pyaudio that were present in the previous sample -_-. Feel free to comment on this bug and I can go into more detail or add to the README or something.))

@JuanCruz98
Copy link

JuanCruz98 commented Jun 15, 2017 via email

@jerjou
Copy link
Contributor

jerjou commented Jun 15, 2017

@datspike unfortunately, if I understand your question correctly, this isn't something that you can reliably do with the API at the moment (the API, for instance, may take longer to transcribe some utterances than others, or may have to retry audio behind the scenes, increasing response latency).

...stay tuned, though ;-)

@datspike
Copy link

datspike commented Jun 15, 2017

@jerjou thanks for the reply.
It's kinda working, after some calculating I can get my timestamps to be correct up to 0,5 seconds ;)
Also, you can bypass the exception at 60s mark if you will yield None in the generator just before those 60s expire.
I'm using that in my code to correctly stop recognition process.

@jerjou
Copy link
Contributor

jerjou commented Jul 5, 2017

While I'm here - any interest in my porting the sample that attempts to work around the 1-minute time limit for streaming requests?

@thecodingwizard
Copy link

@jerjou Using your code, I'm having an issue when running streaming_recognize in single_utterance mode:

results_gen = audio_sample.streaming_recognize(
         language_code=language_code, interim_results=True, 
         single_utterance=True)                       # Note the added single_utterance

After I receive the final utterance, I set MicAsFile.closed to True, expecting the recognition to stop and exit.

for result in results:
    for alternative in result.alternatives:
        if result.is_final:
            stream.closed = True    # Set closed to True on MicAsFile instance
            print(alternative.transcript)

However, the recognition doesn't exit, and instead crashes after around a minute (presumably after exceeding the maximum time limit).

To make the recognition exit, I have to stop the pyaudio stream:

self._audio_stream.stop_stream()

Unfortunately, since I'm sharing the audio stream, I can't stop it without interfering in other parts of my program.

Is there a way to stop speech recognition without closing the input stream?

An easy way to reproduce this is to grab transcribe_streaming_mic.py and add single_utterance=True to

results_gen = audio_sample.streaming_recognize(
  language_code=language_code, interim_results=True)

The desired behavior is for the program to exit after returning the single utterance.

The current behavior is that the program continues running until around a minute later, until it crashes (presumably due to the 60 second limit).

@jerjou
Copy link
Contributor

jerjou commented Jul 6, 2017

So I think you have two issues, which have separate answers:

stream.closed = True doesn't stop the stream

It works for me. Did you add the stream argument to listen_print_loop and pass it in?

single_utterance=True doesn't close stream after first utterance

This appears to be true when using the raw grpc client as well. I filed a bug, but in the meantime, it appears that closing the request stream (eg via stream.closed=True) effectively terminates the connection as well.

@jerjou jerjou closed this as completed Jul 6, 2017
@thecodingwizard
Copy link

I did add the stream argument, but for some reason, setting closed to True doesn't stop the recognition for me. I have to close the pyaudio stream too: self._audio_stream.close_stream() for the recognition to stop. However, that's perfectly fine for my program.

Thanks for all your help! The sample code was a lifesaver ❤️

@jerjou
Copy link
Contributor

jerjou commented Jul 10, 2017

Oh huh - that's unfortunate. It sounds like you're okay for now, but if you'd like to pursue the issue I'm happy to pursue it with you :-) Maybe open a new issue, though, since it's a bit tangential to this one. It'd also be helpful to include the git diff, and the platform that you're running on, just to try to figure out why I wasn't able to reproduce.

@share9
Copy link

share9 commented Nov 27, 2018

@jerjou @thecodingwizard Hi, does the code above and at

"""Sample that streams audio to the Google Cloud Speech API via GRPC.
still work for continuing with streaming transcription for however long is needed? I'm not familiar with Python though I may try to convert to Swift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants