Skip to content

ENH: Add Index.filter() method #51370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 13, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
    • New feature - no issue
  • Tests added and passed if fixing a bug or adding a new feature
    • pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_string()
    • pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_int()
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/v2.0.0.rst file if fixing a bug or adding a new feature.

This is similar to DataFrame.filter(), except it returns an Index object, and avoids any under-the-hood things that might be happening with DataFrame.filter() in terms of views/copies of the DataFrame. Some examples where this would be useful (and helpful when doing type checking).

  • Instead of df.columns = [x for x in otherdf.columns if "at" in x], you can do df.columns = otherdf.columns.filter(like="at")
  • Instead of df.drop(columns=[x for x in df.columns if x.startswith("b")], you can do df.drop(columns=df.columns.filter(regex=r"b.*")
  • Instead of df.set_index([x for x in df.columns if x.endswith("z")]), you can do df.set_index(df.columns.filter(r".*z$"))

@Dr-Irv Dr-Irv added the Index Related to the Index class or subclasses label Feb 13, 2023
@Dr-Irv Dr-Irv added this to the 2.0 milestone Feb 13, 2023
@Dr-Irv Dr-Irv requested review from WillAyd and rhshadrach February 13, 2023 23:11
@Dr-Irv Dr-Irv mentioned this pull request Mar 10, 2023
1 task
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but can't you do this mostly with the string accesors as well?

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with MultiIndex? or non string indexes?

)

if items is not None:
mask = [r in items for r in self]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super slow I guess? You should be able to use isin here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the pattern that is used in DataFrame.filter() :

**{name: [r for r in items if r in labels]} # type: ignore[arg-type]

@jreback
Copy link
Contributor

jreback commented Mar 11, 2023

-1 here this is a confusing name (well it's an ok name except filter for dataframe does this)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 11, 2023

I might be missing something, but can't you do this mostly with the string accesors as well?

Yes, but it's a bit awkward, because the string accessors return boolean arrays.

Comparison using example idx = pd.Index(["cat", "dog", "bat", "bird"], dtype=object)

Accessors With this PR
idx[idx.isin(["cat", "dog"])] idx.filter(["cat", "dog"])
idx[idx.str.contains("at")] idx.filter(like="at")
idx[idx.str.match(r"b.*")] idx.filter(match=r"b.*")

I also think this PR will be more performant, because you are making just one pass through the index.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 11, 2023

-1 here this is a confusing name (well it's an ok name except filter for dataframe does this)

Yes, but filter for DataFrame returns the entire DataFrame, and the goal here is to return a filtered Index . Open to using a different name.

@mroeschke
Copy link
Member

Looks like this PR has gone stale and might need some discussion on an issue first before moving forward. Going to close for now, but we can reopen when ready to move forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants