DOC: Improve lookup documentation #61471

stevenae · 2025-05-21T15:04:30Z

closes ENH: re-implement DataFrame.lookup. #40140
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Follows from #61185

Examples available at https://colab.research.google.com/drive/1MGWX6JVJL5yHyK7BeEBPQAW4tLM3TZL9#scrollTo=DjWfk4i1SiOY

Add pd_lookup_het() and pd_lookup_hom()

rhshadrach

Thanks for the PR! No strong opposition to having both functions, but the performance gain of the _het version does not seem significant to me.

doc/source/user_guide/indexing.rst

stevenae · 2025-05-27T15:34:30Z

Addressed your concerns! If you have time for a review.

…

On Wed, May 21, 2025, 4:33 PM Richard Shadrach ***@***.***> wrote: ***@***.**** requested changes on this pull request. No strong opposition to having both functions, but the performance gain of the _het version does not seem significant to me. ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > - df = pd.DataFrame({'col': ["A", "A", "B", "B"], - 'A': [80, 23, np.nan, 22], - 'B': [80, 55, 76, 67]}) - df - idx, cols = pd.factorize(df['col']) - df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx] + def pd_lookup_hom(df, row_labels, col_labels): + rows = df.index.get_indexer(row_labels) Can you add df = df.loc[:, sorted(set(col_labels))] here. ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > + +.. code-block:: python + + def pd_lookup_het(df, row_labels, col_labels): + rows = df.index.get_indexer(row_labels) + cols = df.columns.get_indexer(col_labels) + sub = df.take(np.unique(cols), axis=1) + sub = sub.take(np.unique(rows), axis=0) + rows = sub.index.get_indexer(row_labels) + values = sub.melt()["value"] + cols = sub.columns.get_indexer(col_labels) + flat_index = rows + cols * len(sub) + result = values[flat_index] + return result + +For homogeneous column types, it is fastest to skip column subsetting and go directly to numpy: Nit: NumPy ------------------------------ In doc/source/user_guide/indexing.rst <#61471 (comment)>: > -.. ipython:: python +For heterogeneous column types, we subset columns to avoid unnecessary numpy conversions: NumPy again. — Reply to this email directly, view it on GitHub <#61471 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFZOHK43SLWEMO55YCEVV327TPLNAVCNFSM6AAAAAB5TRZE7WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNJZGAZTKOJZGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rhshadrach

lgtm

mroeschke · 2025-06-02T16:57:16Z

Thanks @stevenae

stevenae added 2 commits May 21, 2025 10:46

Update indexing.rst

f17b96c

Add pd_lookup_het() and pd_lookup_hom()

Update indexing.rst

12b6572

This was referenced May 21, 2025

ENH: Reimplement DataFrame.lookup #61185

Closed

ENH: re-implement DataFrame.lookup. #40140

Closed

mroeschke requested a review from rhshadrach May 21, 2025 16:02

mroeschke added the Docs label May 21, 2025

rhshadrach requested changes May 21, 2025

View reviewed changes

doc/source/user_guide/indexing.rst Show resolved Hide resolved

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved

rhshadrach added the Indexing Related to indexing on series/frames, not to indexes themselves label May 21, 2025

rhshadrach added this to the 3.0 milestone May 21, 2025

address pandas-dev#61471 (review)

7292c17

datapythonista changed the title ~~Improve lookup documentation~~ DOC: Improve lookup documentation May 23, 2025

rhshadrach approved these changes May 30, 2025

View reviewed changes

mroeschke approved these changes Jun 2, 2025

View reviewed changes

mroeschke merged commit 4a09336 into pandas-dev:main Jun 2, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Improve lookup documentation #61471

DOC: Improve lookup documentation #61471

stevenae commented May 21, 2025

Uh oh!

rhshadrach left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevenae commented May 27, 2025 via email

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

mroeschke commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

DOC: Improve lookup documentation #61471

DOC: Improve lookup documentation #61471

Conversation

stevenae commented May 21, 2025

Uh oh!

rhshadrach left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevenae commented May 27, 2025 via email

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Jun 2, 2025

Uh oh!

Uh oh!

rhshadrach left a comment •

edited

Loading