-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Streamed Voice Agent Demo - Multiple Performance Issues #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for raising the issue and sorry you are experiencing issues with the streamed demo. We are looking into a couple of performance improvements to ship to improve the response time hopefully. We do not support interruptions yet. There is a bit of guidance in the docs on what you can do in the meantime: https://openai.github.io/openai-agents-python/voice/pipeline/#interruptions Overall if your focus is on lowest latency and best interruption handling my suggestion would still be our speech-to-speech model and the Realtime API though. |
Just to add that I also see the issues described above. Notably, the agent frequently detects speech when there is none and then switches to Spanish, making the whole thing unusable for even a simple demonstration. |
Out of curiosity are you using any specific microphones maybe with built-in noise cancellation? |
@dkundel-openai I'm using the built-in microphones on my MacBook Air, nothing with special noise cancellation technology. Do you have any timeline for when interruptions will be supported? I believe this is one of the main use cases for voice agents - allowing for more natural conversation flow where users can interrupt when needed. Thanks. |
@rm-openai @dkundel-openai I tried using the VoicePipeline but the latency is too high and it just randomly transcribes arabic text (I am using StreamedAudioInput). The realtime model is great but how do I use it with Agents SDK? Is the support for gpt-4o-realtime planned / are there any workarounds? |
@dkundel-openai any support for realtime audio models added yet? |
It's planned but there is no timeline yet when support for it will land. |
@dkundel-openai thanks. Any timelines for fixing the above reported issues? |
Streamed Voice Agent Demo - Multiple Performance Issues
Description
The streamed voice agent demo is experiencing several critical issues that affect its usability and functionality:
semantic_vad
apparently being implemented in the code.Steps to Reproduce
Expected Behavior
semantic_vad
feature should allow interruption of the agent's responsesActual Behavior
semantic_vad
Technical Details
From code inspection,
semantic_vad
appears to be implemented but is not functioning as expected. This suggests a potential issue with how the feature is integrated or configured in the current build.Additional Notes
These issues significantly impact the user experience and demonstration value of the agent. The latency and language switching problems are particularly disruptive during presentations.
Possible Solutions
semantic_vad
implementation to ensure proper configurationPriority
High - These issues prevent effective demonstration of the voice agent's capabilities.
The text was updated successfully, but these errors were encountered: