Skip to content

docs: Add docstring code samples for Series.apply and DataFrame.map #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 61 additions & 1 deletion third_party/bigframes_vendored/pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2159,8 +2159,68 @@ def map(self, func, na_action: Optional[str] = None) -> DataFrame:
In pandas 2.1.0, DataFrame.applymap is deprecated and renamed to
DataFrame.map.

**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

Let's use ``reuse=False`` flag to make sure a new ``remote_function``
is created every time we run the following code, but you can skip it
to potentially reuse a previously deployed ``remote_function`` from
the same user defined function.

>>> @bpd.remote_function([int], float, reuse=False)
... def minutes_to_hours(x):
... return x/60

>>> df_minutes = bpd.DataFrame(
... {"system_minutes" : [0, 30, 60, 90, 120],
... "user_minutes" : [0, 15, 75, 90, 6]})
>>> df_minutes
system_minutes user_minutes
0 0 0
1 30 15
2 60 75
3 90 90
4 120 6
<BLANKLINE>
[5 rows x 2 columns]

>>> df_hours = df_minutes.map(minutes_to_hours)
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 1.5 1.5
4 2.0 0.1
<BLANKLINE>
[5 rows x 2 columns]

If there are ``NA``/``None`` values in the data, you can ignore
applying the remote function on such values by specifying
``na_action='ignore'``.

>>> df_minutes = bpd.DataFrame(
... {
... "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
... "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
... }, dtype="Int64")
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
>>> df_hours
system_minutes user_minutes
0 0.0 0.0
1 0.5 0.25
2 1.0 1.25
3 <NA> 1.5
4 1.5 0.1
5 2.0 <NA>
6 <NA> <NA>
<BLANKLINE>
[7 rows x 2 columns]

Args:
func:
func (function):
Python function wrapped by ``remote_function`` decorator,
returns a single value from a single value.
na_action (Optional[str], default None):
Expand Down
68 changes: 62 additions & 6 deletions third_party/bigframes_vendored/pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -728,18 +728,74 @@ def apply(
func,
) -> DataFrame | Series:
"""
Invoke function on values of Series.
Invoke function on values of a Series.

Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on single values.
**Examples:**

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

Let's use ``reuse=False`` flag to make sure a new ``remote_function``
is created every time we run the following code, but you can skip it
to potentially reuse a previously deployed ``remote_function`` from
the same user defined function.

>>> @bpd.remote_function([int], float, reuse=False)
... def minutes_to_hours(x):
... return x/60

>>> minutes = bpd.Series([0, 30, 60, 90, 120])
>>> minutes
0 0
1 30
2 60
3 90
4 120
dtype: Int64

>>> hours = minutes.apply(minutes_to_hours)
>>> hours
0 0.0
1 0.5
2 1.0
3 1.5
4 2.0
dtype: Float64

You could turn a user defined function with external package
dependencies into a BigQuery DataFrames remote function. You would
provide the names of the packages via ``packages`` param.

>>> @bpd.remote_function(
... [str],
... str,
... reuse=False,
... packages=["cryptography"],
... )
... def get_hash(input):
... from cryptography.fernet import Fernet
...
... # handle missing value
... if input is None:
... input = ""
...
... key = Fernet.generate_key()
... f = Fernet(key)
... return f.encrypt(input.encode()).decode()

>>> names = bpd.Series(["Alice", "Bob"])
>>> hashes = names.apply(get_hash)

Args:
func (function):
Python function or NumPy ufunc to apply.
BigFrames DataFrames ``remote_function`` to apply. The function
should take a scalar and return a scalar. It will be applied to
every element in the ``Series``.

Returns:
bigframes.series.Series: If func returns a Series object the result
will be a DataFrame.
bigframes.series.Series: A new Series with values representing the
return value of the ``func`` applied to each element of the original
Series.
"""
raise NotImplementedError(constants.ABSTRACT_METHOD_ERROR_MESSAGE)

Expand Down