Skip to content

feat: add on parameter in dataframe.rolling() and dataframe.groupby.rolling() #1556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 1, 2025

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Mar 27, 2025

I didn't add this parameter for Series because it does seem to make much sense: Series doesn't have columns.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 27, 2025
@sycai sycai added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 28, 2025
@sycai sycai marked this pull request as ready for review March 28, 2025 22:27
@sycai sycai requested review from a team as code owners March 28, 2025 22:27
@sycai sycai requested a review from chelsea-lin March 28, 2025 22:27
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 28, 2025
@GarrettWu GarrettWu removed their assignment Mar 31, 2025
@sycai sycai requested a review from TrevorBergeron March 31, 2025 22:01
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@@ -1052,6 +1052,8 @@ def rolling(
self,
window,
min_periods: int | None = None,
on: str | None = None,
closed: Literal["right", "left", "both", "neither"] = "right",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code examples are added in this rolling method yet. Could you please help to fill the gaps here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Doc added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Maybe add more examples with supported parameters.

@@ -1083,6 +1085,16 @@ def rolling(
For a window that is specified by an integer, ``min_periods`` will default
to the size of the window.

on (str, optional):
For a DataFrame, a column label or Index level on which to calculate the rolling window,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try the "Index level" cases here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I removed index level here because I do not plan to support that at this time

@sycai sycai force-pushed the sycai_rolling_window branch from 0ed41e1 to 50df383 Compare April 1, 2025 17:43
@sycai sycai requested a review from chelsea-lin April 1, 2025 18:03
Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with two nit comments. Thanks!

@@ -1052,6 +1052,8 @@ def rolling(
self,
window,
min_periods: int | None = None,
on: str | None = None,
closed: Literal["right", "left", "both", "neither"] = "right",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Maybe add more examples with supported parameters.

@@ -1035,6 +1035,16 @@ def rolling(self, *args, **kwargs):
For a window that is specified by an integer, ``min_periods`` will default
to the size of the window.

on (str, optional):
For a DataFrame, a column label or Index level on which to calculate the rolling window,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "Index level" be removed here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

"""

grouping_keys: Tuple[ex.DerefOp, ...] = tuple()
ordering: Tuple[orderings.OrderingExpression, ...] = tuple()
bounds: Union[RowsWindowBounds, RangeWindowBounds, None] = None
min_periods: int = 0
on: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not sure I understand this? I don't see why it should be part of the window spec? we already have "input column".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter removed. PTAL

@sycai sycai requested a review from TrevorBergeron April 1, 2025 22:25
@sycai sycai force-pushed the sycai_rolling_window branch from b453a96 to c27d0bc Compare April 1, 2025 22:30
):
self._block = block
self._window_spec = window_spec
self._value_column_ids = value_column_ids
self._drop_null_groups = drop_null_groups
self._is_series = is_series
# The column ID that won't be aggregated on.
# This is equivalent to pandas `on` parameter in rolling()
self._skip_agg_column_id = skip_agg_column_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will eventually be a set in order to handle multiple columns for on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be unlikely. Both pandas and SQL allow only one column to be used for on (or its equivalent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe not for on, but for grouping columns. We diverge a bit right now I think for some grouping cases. Anyways, can always change later if need be.

@sycai sycai merged commit 45c9d9f into main Apr 1, 2025
24 checks passed
@sycai sycai deleted the sycai_rolling_window branch April 1, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants