Streamed Voice Agent Demo - Multiple Performance Issues #301

muhammadsmalik · 2025-03-22T11:13:07Z

Streamed Voice Agent Demo - Multiple Performance Issues

Description

The streamed voice agent demo is experiencing several critical issues that affect its usability and functionality:

High Latency: There is a significant delay (3-4 seconds) before receiving responses.
Language Switching: The agent randomly switches to Spanish during conversations.
Over-sensitivity: The agent frequently detects speech and provides incorrect descriptions even when no one is speaking.
Interruption Issues: The agent cannot be interrupted despite semantic_vad apparently being implemented in the code.

Steps to Reproduce

Launch the streamed voice agent demo
Attempt to engage in conversation with the agent
Observe the delay between speaking and receiving a response
Continue conversation for several exchanges to observe language switching
Remain silent for periods to observe false speech detection
Try to interrupt the agent while it's speaking

Expected Behavior

Responses should begin within 1 second of user input
The agent should maintain the initially selected language throughout the conversation
Speech detection should only activate when actual speech is present
The semantic_vad feature should allow interruption of the agent's responses

Actual Behavior

Responses take 3-4 seconds to begin after user input
The agent randomly switches to Spanish during English conversations
The agent frequently reports detecting speech and provides descriptions when no one is speaking
The agent cannot be interrupted despite the apparent implementation of semantic_vad

Technical Details

From code inspection, semantic_vad appears to be implemented but is not functioning as expected. This suggests a potential issue with how the feature is integrated or configured in the current build.

Additional Notes

These issues significantly impact the user experience and demonstration value of the agent. The latency and language switching problems are particularly disruptive during presentations.

Possible Solutions

Investigate streaming optimization to reduce latency
Check language model configuration for potential causes of language switching
Adjust speech detection sensitivity parameters
Review semantic_vad implementation to ensure proper configuration

Priority

High - These issues prevent effective demonstration of the voice agent's capabilities.

The text was updated successfully, but these errors were encountered:

rm-openai · 2025-03-22T15:59:17Z

cc @dkundel-openai

dkundel-openai · 2025-03-22T17:32:17Z

Hi @muhammadsmalik

Thanks for raising the issue and sorry you are experiencing issues with the streamed demo.

We are looking into a couple of performance improvements to ship to improve the response time hopefully.

We do not support interruptions yet. There is a bit of guidance in the docs on what you can do in the meantime: https://openai.github.io/openai-agents-python/voice/pipeline/#interruptions
A proper interruptions implementation will require more client side implementation to make sure that there is detailed information of how much of the text was read when the interruption happened. It's something we want to support but in the meantime the suggestion would be what is laid out in the docs.

Overall if your focus is on lowest latency and best interruption handling my suggestion would still be our speech-to-speech model and the Realtime API though.

duncsand · 2025-03-24T11:33:47Z

Just to add that I also see the issues described above. Notably, the agent frequently detects speech when there is none and then switches to Spanish, making the whole thing unusable for even a simple demonstration.

dkundel-openai · 2025-03-24T17:46:35Z

Out of curiosity are you using any specific microphones maybe with built-in noise cancellation?

muhammadsmalik · 2025-03-25T06:04:20Z

@dkundel-openai I'm using the built-in microphones on my MacBook Air, nothing with special noise cancellation technology.

Do you have any timeline for when interruptions will be supported? I believe this is one of the main use cases for voice agents - allowing for more natural conversation flow where users can interrupt when needed. Thanks.

anuragsharanjuspay · 2025-05-05T11:13:04Z

@rm-openai @dkundel-openai I tried using the VoicePipeline but the latency is too high and it just randomly transcribes arabic text (I am using StreamedAudioInput). The realtime model is great but how do I use it with Agents SDK? Is the support for gpt-4o-realtime planned / are there any workarounds?

pmmohanmishra · 2025-05-06T02:43:01Z

@dkundel-openai any support for realtime audio models added yet?

dkundel-openai · 2025-05-06T02:56:34Z

It's planned but there is no timeline yet when support for it will land.
We are working on improving latency on the chained approach though in the meantime.

pmmohanmishra · 2025-05-06T03:30:02Z

@dkundel-openai thanks. Any timelines for fixing the above reported issues?

muhammadsmalik added the bug Something isn't working label Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streamed Voice Agent Demo - Multiple Performance Issues #301

Streamed Voice Agent Demo - Multiple Performance Issues #301

muhammadsmalik commented Mar 22, 2025

rm-openai commented Mar 22, 2025

Uh oh!

dkundel-openai commented Mar 22, 2025

Uh oh!

duncsand commented Mar 24, 2025

Uh oh!

dkundel-openai commented Mar 24, 2025

Uh oh!

muhammadsmalik commented Mar 25, 2025

Uh oh!

anuragsharanjuspay commented May 5, 2025

Uh oh!

pmmohanmishra commented May 6, 2025

Uh oh!

dkundel-openai commented May 6, 2025

Uh oh!

pmmohanmishra commented May 6, 2025

Uh oh!

Streamed Voice Agent Demo - Multiple Performance Issues #301

Streamed Voice Agent Demo - Multiple Performance Issues #301

Comments

muhammadsmalik commented Mar 22, 2025

Streamed Voice Agent Demo - Multiple Performance Issues

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Technical Details

Additional Notes

Possible Solutions

Priority

rm-openai commented Mar 22, 2025

Uh oh!

dkundel-openai commented Mar 22, 2025

Uh oh!

duncsand commented Mar 24, 2025

Uh oh!

dkundel-openai commented Mar 24, 2025

Uh oh!

muhammadsmalik commented Mar 25, 2025

Uh oh!

anuragsharanjuspay commented May 5, 2025

Uh oh!

pmmohanmishra commented May 6, 2025

Uh oh!

dkundel-openai commented May 6, 2025

Uh oh!

pmmohanmishra commented May 6, 2025

Uh oh!