Skip to content

feat: Support audio_transcribe with partial ordering #1908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

shuoweil
Copy link
Contributor

feat: Support audio transcription with partial ordering
This change also fixes a related issue where Block.join would fail on joins with null indexes when operating in this partial ordering mode.

b/430572560

@shuoweil shuoweil requested review from a team as code owners July 15, 2025 18:20
@shuoweil shuoweil requested a review from TrevorBergeron July 15, 2025 18:20
@shuoweil shuoweil self-assigned this Jul 15, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 15, 2025
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 15, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 15, 2025
@shuoweil shuoweil removed the request for review from TrevorBergeron July 15, 2025 18:36
@shuoweil shuoweil marked this pull request as draft July 15, 2025 18:36
@shuoweil shuoweil force-pushed the shuowei-transcribe-partial-order branch from 78dcbf0 to abc6dae Compare July 15, 2025 21:04
@shuoweil shuoweil force-pushed the shuowei-transcribe-partial-order branch from 4b2927f to 5560902 Compare July 15, 2025 21:38
@shuoweil shuoweil requested a review from TrevorBergeron July 15, 2025 22:12
@shuoweil shuoweil marked this pull request as ready for review July 15, 2025 22:12
@@ -2488,6 +2488,11 @@ def join(
)
if result is not None:
return result

# For block identify joins with null indices, perform cross join
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem desirable. If df1 is n rows and df2 is m rows, won't this end up with n x m rows?

result = df.to_pandas(ordered=False)

assert "transcribed_text" in result.columns
assert len(result) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of rows in result should be exactly equal to the number of rows in audio_mm_df_partial_ordering.

@shuoweil shuoweil marked this pull request as draft July 16, 2025 18:12
@shuoweil shuoweil added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants