feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements Voice Pipeline Orchestration for OpenAI's Realtime API, providing a comprehensive framework for managing voice interactions with the gpt-realtime model.
Motivation
While working with the OpenAI Agents SDK, I wanted to contribute a voice pipeline orchestration feature that makes it easier to build voice-enabled applications using OpenAI's Realtime API with gpt-realtime, Whisper STT, and the new Marin/Cedar voices.
What's Included
Core Implementation (
packages/agents-realtime/src/voicePipeline.ts
)VoicePipeline
class with event-driven architectureComprehensive Tests (
packages/agents-realtime/test/voicePipeline.test.ts
)Documentation (
docs/src/content/docs/guides/voice-pipeline.mdx
)Working Example (
examples/voice-pipeline/
)Key Features
✅ OpenAI Realtime API Integration
✅ Real-time Processing
✅ Voice Activity Detection
✅ Audio Enhancement
✅ Developer Experience
Usage Example
Testing
All tests pass with the existing test suite. The new tests follow the same patterns as existing SDK tests.
Breaking Changes
None. This is a purely additive feature that doesn't modify any existing APIs.
Checklist
Notes
This contribution provides a framework for voice pipeline orchestration that integrates with OpenAI's Realtime API. The implementation focuses on providing a clean abstraction over the complexity of audio streaming, transcription, and synthesis while maintaining low latency for real-time voice interactions.
Thank you for considering this contribution! 🙏