DOC: Improve lookup documentation #61471

stevenae · 2025-05-21T15:04:30Z

closes ENH: re-implement DataFrame.lookup. #40140
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Follows from #61185

Examples available at https://colab.research.google.com/drive/1MGWX6JVJL5yHyK7BeEBPQAW4tLM3TZL9#scrollTo=DjWfk4i1SiOY

Add pd_lookup_het() and pd_lookup_hom()

rhshadrach

Thanks for the PR! No strong opposition to having both functions, but the performance gain of the _het version does not seem significant to me.

doc/source/user_guide/indexing.rst

stevenae · 2025-05-27T15:34:30Z

Addressed your concerns! If you have time for a review.

…

On Wed, May 21, 2025, 4:33 PM Richard Shadrach ***@***.***> wrote: ***@***.**** requested changes on this pull request. No strong opposition to having both functions, but the performance gain of the _het version does not seem significant to me. ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > - df = pd.DataFrame({'col': ["A", "A", "B", "B"], - 'A': [80, 23, np.nan, 22], - 'B': [80, 55, 76, 67]}) - df - idx, cols = pd.factorize(df['col']) - df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx] + def pd_lookup_hom(df, row_labels, col_labels): + rows = df.index.get_indexer(row_labels) Can you add df = df.loc[:, sorted(set(col_labels))] here. ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > + +.. code-block:: python + + def pd_lookup_het(df, row_labels, col_labels): + rows = df.index.get_indexer(row_labels) + cols = df.columns.get_indexer(col_labels) + sub = df.take(np.unique(cols), axis=1) + sub = sub.take(np.unique(rows), axis=0) + rows = sub.index.get_indexer(row_labels) + values = sub.melt()["value"] + cols = sub.columns.get_indexer(col_labels) + flat_index = rows + cols * len(sub) + result = values[flat_index] + return result + +For homogeneous column types, it is fastest to skip column subsetting and go directly to numpy: Nit: NumPy ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > -.. ipython:: python +For heterogeneous column types, we subset columns to avoid unnecessary numpy conversions: NumPy again. — Reply to this email directly, view it on GitHub <#61471 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFZOHK43SLWEMO55YCEVV327TPLNAVCNFSM6AAAAAB5TRZE7WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNJZGAZTKOJZGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

stevenae added 2 commits May 21, 2025 10:46

Update indexing.rst

f17b96c

Add pd_lookup_het() and pd_lookup_hom()

Update indexing.rst

12b6572

This was referenced May 21, 2025

ENH: Reimplement DataFrame.lookup #61185

Open

ENH: re-implement DataFrame.lookup. #40140

Open

mroeschke requested a review from rhshadrach May 21, 2025 16:02

mroeschke added the Docs label May 21, 2025

rhshadrach requested changes May 21, 2025

View reviewed changes

doc/source/user_guide/indexing.rst Show resolved Hide resolved

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved

rhshadrach added the Indexing Related to indexing on series/frames, not to indexes themselves label May 21, 2025

rhshadrach added this to the 3.0 milestone May 21, 2025

address pandas-dev#61471 (review)

7292c17

datapythonista changed the title ~~Improve lookup documentation~~ DOC: Improve lookup documentation May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Improve lookup documentation #61471

DOC: Improve lookup documentation #61471

stevenae commented May 21, 2025

Uh oh!

rhshadrach left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevenae commented May 27, 2025 via email

Uh oh!

Uh oh!

Uh oh!

DOC: Improve lookup documentation #61471

Are you sure you want to change the base?

DOC: Improve lookup documentation #61471

Conversation

stevenae commented May 21, 2025

Uh oh!

rhshadrach left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevenae commented May 27, 2025 via email

Uh oh!

Uh oh!

rhshadrach left a comment •

edited

Loading