Skip to content

feat: add client side retry to GeminiTextGenerator #1242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 27, 2024
Merged

Conversation

GarrettWu
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

b/385221339

@GarrettWu GarrettWu requested review from shobsi and jiaxunwu December 26, 2024 21:41
@GarrettWu GarrettWu self-assigned this Dec 26, 2024
@GarrettWu GarrettWu requested review from a team as code owners December 26, 2024 21:41
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 26, 2024
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 26, 2024
@@ -945,6 +946,7 @@ def predict(
top_k: int = 40,
top_p: float = 1.0,
ground_with_google_search: bool = False,
retry: int = 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename it to max_retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG


if (df[_ML_GENERATE_TEXT_STATUS] != "").any():
df_succ = df[df[_ML_GENERATE_TEXT_STATUS].str.len() == 0]
df_fail = df[df[_ML_GENERATE_TEXT_STATUS].str.len() > 0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use df_succ to tell whether there is any failed row? Then it can reduce one filter op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. We need df_fail for next retry, and append df_succ to df_result of current round. And I don't think DF subtract will be more effecient.

@GarrettWu GarrettWu enabled auto-merge (squash) December 27, 2024 21:47
@GarrettWu GarrettWu merged commit 8193abe into main Dec 27, 2024
22 checks passed
@GarrettWu GarrettWu deleted the garrettwu-retry branch December 27, 2024 22:43
@@ -983,6 +985,10 @@ def predict(
page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is `False`.

max_retries (int, default 0):
Max number of retry rounds if any rows failed in the prediction. Each round need to make progress (has succeeded rows) to continue the next retry round.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] suggest rewording

Suggested change
Max number of retry rounds if any rows failed in the prediction. Each round need to make progress (has succeeded rows) to continue the next retry round.
Max number of retries if the prediction for any rows failed. Each try needs to make progress (i.e. has successfully predicted rows) to continue the retry.

@@ -983,6 +985,10 @@ def predict(
page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is `False`.

max_retries (int, default 0):
Max number of retry rounds if any rows failed in the prediction. Each round need to make progress (has succeeded rows) to continue the next retry round.
Each round will append newly succeeded rows. When the max retry rounds is reached, the remaining failed rows will be appended to the end of the result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] suggest rewording

Suggested change
Each round will append newly succeeded rows. When the max retry rounds is reached, the remaining failed rows will be appended to the end of the result.
Each retry will append newly succeeded rows. When the max retries are reached, the remaining rows (the ones without successful predictions) will be appended to the end of the result.


if (df[_ML_GENERATE_TEXT_STATUS] != "").any():
df_succ = df[df[_ML_GENERATE_TEXT_STATUS].str.len() == 0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for readability

            success = df[_ML_GENERATE_TEXT_STATUS].str.len() == 0
	    df_succ = df[success]
            df_fail = df[~success]

break

df_result = (
bpd.concat([df_result, df_succ]) if not df_result.empty else df_succ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the if-else redundant here? bpd.concat() should handle if any of the input dfs is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gave me error in tests, sth related to multi-index

df_fail = df[df[_ML_GENERATE_TEXT_STATUS].str.len() > 0]

if df_succ.empty:
warnings.warn("Can't make any progress, stop retrying.", RuntimeWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this warning will surface even if the user didn't wish for any retries (the default behavior), something which is not desirable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants