Skip to content

Conversation

seratch
Copy link
Member

@seratch seratch commented Sep 3, 2025

this is still in progress but will resolve #1614

@seratch seratch requested a review from rm-openai September 3, 2025 06:13
@seratch seratch added enhancement New feature or request feature:realtime labels Sep 3, 2025
Comment on lines 48 to 55
# Disable server-side interrupt_response to avoid truncating assistant audio
session_context = await runner.run(
model_config={
"initial_model_settings": {
"turn_detection": {"type": "semantic_vad", "interrupt_response": False}
}
}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do this by default? why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored some changes to make the audio output quality, but they're not related to the gpt-realtime migration. So, I've reverted all of them. I will continue seeing improvements for this example app, but it can be done with a separate pull request.

@@ -93,7 +111,9 @@ async def _serialize_event(self, event: RealtimeSessionEvent) -> dict[str, Any]:
base_event["tool"] = event.tool.name
base_event["output"] = str(event.output)
elif event.type == "audio":
base_event["audio"] = base64.b64encode(event.audio.data).decode("utf-8")
# Coalesce raw PCM and flush on a steady timer for smoother playback.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a quality improvement? would be nice to make it be a separate PR if so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, same with above (I won't repeat this for the rest)

"type": event.data.type,
}
# Surface useful raw events to the UI with details.
if getattr(event.data, "type", None) == "transcript_delta":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz no getattr

@@ -142,7 +195,8 @@ async def websocket_endpoint(websocket: WebSocket, session_id: str):
if message["type"] == "audio":
# Convert int16 array to bytes
int16_data = message["data"]
audio_bytes = struct.pack(f"{len(int16_data)}h", *int16_data)
# Send little-endian PCM16 to the model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this change as part of the GA?

@seratch seratch marked this pull request as ready for review September 4, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature:realtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for gpt-realtime
2 participants