ENH: Add Index.filter() method #51370

Dr-Irv · 2023-02-13T23:09:57Z

closes #xxxx (Replace xxxx with the GitHub issue number)
- New feature - no issue
Tests added and passed if fixing a bug or adding a new feature
- pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_string()
- pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_int()
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.0.0.rst file if fixing a bug or adding a new feature.

This is similar to DataFrame.filter(), except it returns an Index object, and avoids any under-the-hood things that might be happening with DataFrame.filter() in terms of views/copies of the DataFrame. Some examples where this would be useful (and helpful when doing type checking).

Instead of df.columns = [x for x in otherdf.columns if "at" in x], you can do df.columns = otherdf.columns.filter(like="at")
Instead of df.drop(columns=[x for x in df.columns if x.startswith("b")], you can do df.drop(columns=df.columns.filter(regex=r"b.*")
Instead of df.set_index([x for x in df.columns if x.endswith("z")]), you can do df.set_index(df.columns.filter(r".*z$"))

phofl

I might be missing something, but can't you do this mostly with the string accesors as well?

phofl

How does this work with MultiIndex? or non string indexes?

phofl · 2023-03-11T09:48:40Z

pandas/core/indexes/base.py

+            )
+
+        if items is not None:
+            mask = [r in items for r in self]


This is super slow I guess? You should be able to use isin here.

I just copied the pattern that is used in DataFrame.filter() :

pandas/pandas/core/generic.py

Line 5531 in 4d74fbd

**{name: [r for r in items if r in labels]} # type: ignore[arg-type]

jreback · 2023-03-11T13:03:57Z

-1 here this is a confusing name (well it's an ok name except filter for dataframe does this)

Dr-Irv · 2023-03-11T19:59:26Z

I might be missing something, but can't you do this mostly with the string accesors as well?

Yes, but it's a bit awkward, because the string accessors return boolean arrays.

Comparison using example idx = pd.Index(["cat", "dog", "bat", "bird"], dtype=object)

Accessors	With this PR
`idx[idx.isin(["cat", "dog"])]`	`idx.filter(["cat", "dog"])`
`idx[idx.str.contains("at")]`	`idx.filter(like="at")`
`idx[idx.str.match(r"b.*")]`	`idx.filter(match=r"b.*")`

I also think this PR will be more performant, because you are making just one pass through the index.

Dr-Irv · 2023-03-11T20:04:00Z

-1 here this is a confusing name (well it's an ok name except filter for dataframe does this)

Yes, but filter for DataFrame returns the entire DataFrame, and the goal here is to return a filtered Index . Open to using a different name.

mroeschke · 2023-08-01T17:19:46Z

Looks like this PR has gone stale and might need some discussion on an issue first before moving forward. Going to close for now, but we can reopen when ready to move forward

ENH: Add Index.filter() method

9bedf12

Dr-Irv added the Index Related to the Index class or subclasses label Feb 13, 2023

Dr-Irv added this to the 2.0 milestone Feb 13, 2023

Dr-Irv requested review from WillAyd and rhshadrach February 13, 2023 23:11

Dr-Irv mentioned this pull request Mar 10, 2023

RLS: 2.0 #46776

Closed

1 task

phofl reviewed Mar 11, 2023

View reviewed changes

mroeschke modified the milestones: 2.0, 2.1 Mar 16, 2023

Dr-Irv mentioned this pull request Apr 26, 2023

DEPR: NDFrame.to_period, to_timestamp, tz_localize, tz_convert #52110

Open

mroeschke closed this Aug 1, 2023

Dr-Irv mentioned this pull request Oct 24, 2023

ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add Index.filter() method #51370

ENH: Add Index.filter() method #51370

Uh oh!

Dr-Irv commented Feb 13, 2023

Uh oh!

phofl left a comment

Uh oh!

phofl left a comment

Uh oh!

phofl Mar 11, 2023

Uh oh!

Dr-Irv Mar 11, 2023

Uh oh!

jreback commented Mar 11, 2023

Uh oh!

Dr-Irv commented Mar 11, 2023

Uh oh!

Dr-Irv commented Mar 11, 2023

Uh oh!

mroeschke commented Aug 1, 2023

Uh oh!

Uh oh!

Uh oh!

ENH: Add Index.filter() method #51370

ENH: Add Index.filter() method #51370

Uh oh!

Conversation

Dr-Irv commented Feb 13, 2023

Uh oh!

phofl left a comment

Choose a reason for hiding this comment

Uh oh!

phofl left a comment

Choose a reason for hiding this comment

Uh oh!

phofl Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

Dr-Irv Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 11, 2023

Uh oh!

Dr-Irv commented Mar 11, 2023

Uh oh!

Dr-Irv commented Mar 11, 2023

Uh oh!

mroeschke commented Aug 1, 2023

Uh oh!

Uh oh!