feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410

Joe-Simo · 2025-08-29T02:13:47Z

Summary

This PR implements Voice Pipeline Orchestration for OpenAI's Realtime API, providing a comprehensive framework for managing voice interactions with the gpt-realtime model.

Motivation

While working with the OpenAI Agents SDK, I wanted to contribute a voice pipeline orchestration feature that makes it easier to build voice-enabled applications using OpenAI's Realtime API with gpt-realtime, Whisper STT, and the new Marin/Cedar voices.

What's Included

Core Implementation (`packages/agents-realtime/src/voicePipeline.ts`)

VoicePipeline class with event-driven architecture
Integration with gpt-realtime model
Support for Marin and Cedar realtime voices
Whisper STT integration
WebRTC support for ultra-low latency (<100ms)
Voice Activity Detection (VAD)
Audio enhancement (echo/noise suppression, gain control)
Plugin pattern for easy RealtimeSession integration

Comprehensive Tests (`packages/agents-realtime/test/voicePipeline.test.ts`)

Test coverage for all features
Audio processing and synthesis tests
WebRTC integration tests
Error handling scenarios

Documentation (`docs/src/content/docs/guides/voice-pipeline.mdx`)

Complete usage guide
Configuration examples
Best practices
Integration with RealtimeSession

Working Example (`examples/voice-pipeline/`)

Full implementation example
Demonstrates all features
Ready-to-run with README

Key Features

✅ OpenAI Realtime API Integration

gpt-realtime model support
Marin and Cedar voice options
Whisper STT for transcription

✅ Real-time Processing

Low-latency audio streaming
WebRTC support for <100ms latency
Automatic buffering
Streaming responses

✅ Voice Activity Detection

Configurable thresholds
Debouncing support
Silence detection

✅ Audio Enhancement

Echo suppression
Noise reduction
Automatic gain control

✅ Developer Experience

TypeScript with full type safety
Event-driven API
Plugin pattern for RealtimeSession
Comprehensive metrics monitoring

Usage Example

import { createVoicePipeline } from '@openai/agents/realtime';

const pipeline = createVoicePipeline({
  model: 'gpt-realtime',
  voice: 'marin', // or 'cedar'
  stt: { model: 'whisper-1' }
});

pipeline.on('speech.final', (text) => {
  console.log('User said:', text);
});

await pipeline.processAudio(audioBuffer);

Testing

All tests pass with the existing test suite. The new tests follow the same patterns as existing SDK tests.

Breaking Changes

None. This is a purely additive feature that doesn't modify any existing APIs.

Checklist

Notes

This contribution provides a framework for voice pipeline orchestration that integrates with OpenAI's Realtime API. The implementation focuses on providing a clean abstraction over the complexity of audio streaming, transcription, and synthesis while maintaining low latency for real-time voice interactions.

Thank you for considering this contribution! 🙏

changeset-bot · 2025-08-29T02:13:52Z

⚠️ No Changeset found

Latest commit: e839d59

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Implements comprehensive voice pipeline orchestration for OpenAI's Realtime API: - Voice Pipeline class for managing TTS/STT orchestration with gpt-realtime - Support for Marin and Cedar realtime voices - Whisper STT integration for speech-to-text - WebRTC support for ultra-low latency (<100ms) - Voice Activity Detection (VAD) capabilities - Audio processing with configurable settings - Metrics monitoring for pipeline performance - Plugin system for easy RealtimeSession integration The Voice Pipeline provides a framework for building voice-enabled applications using OpenAI's Realtime API, handling the complexity of audio streaming, transcription, and synthesis while maintaining low latency. Features: - Seamless integration with RealtimeSession - Configurable audio processing (sample rate, encoding, buffer sizes) - Real-time metrics (STT/TTS latency, processing time) - WebRTC support for browser-based voice applications - Event-driven architecture for audio and speech events

seratch added enhancement New feature or request package:agents-realtime labels Aug 29, 2025

Joe-Simo force-pushed the feature/voice-pipeline-orchestration branch from 0683968 to e839d59 Compare August 29, 2025 02:39

seratch marked this pull request as draft August 30, 2025 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410

feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410

Uh oh!

Joe-Simo commented Aug 29, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410

Are you sure you want to change the base?

feat: Voice Pipeline Orchestration - TTS/STT multi-provider support #410

Uh oh!

Conversation

Joe-Simo commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What's Included

Core Implementation (packages/agents-realtime/src/voicePipeline.ts)

Comprehensive Tests (packages/agents-realtime/test/voicePipeline.test.ts)

Documentation (docs/src/content/docs/guides/voice-pipeline.mdx)

Working Example (examples/voice-pipeline/)

Key Features

Usage Example

Testing

Breaking Changes

Checklist

Notes

Uh oh!

changeset-bot bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Uh oh!

Joe-Simo commented Aug 29, 2025 •

edited

Loading

Core Implementation (`packages/agents-realtime/src/voicePipeline.ts`)

Comprehensive Tests (`packages/agents-realtime/test/voicePipeline.test.ts`)

Documentation (`docs/src/content/docs/guides/voice-pipeline.mdx`)

Working Example (`examples/voice-pipeline/`)

changeset-bot bot commented Aug 29, 2025 •

edited

Loading