-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Add Index.filter() method #51370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but can't you do this mostly with the string accesors as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with MultiIndex? or non string indexes?
) | ||
|
||
if items is not None: | ||
mask = [r in items for r in self] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super slow I guess? You should be able to use isin here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the pattern that is used in DataFrame.filter()
:
Line 5531 in 4d74fbd
**{name: [r for r in items if r in labels]} # type: ignore[arg-type] |
-1 here this is a confusing name (well it's an ok name except filter for dataframe does this) |
Yes, but it's a bit awkward, because the string accessors return boolean arrays. Comparison using example
I also think this PR will be more performant, because you are making just one pass through the index. |
Yes, but filter for |
Looks like this PR has gone stale and might need some discussion on an issue first before moving forward. Going to close for now, but we can reopen when ready to move forward |
pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_string()
pandas\pandas\tests\indexes\test_base.py:TestIndex.test_filter_int()
doc/source/whatsnew/v2.0.0.rst
file if fixing a bug or adding a new feature.This is similar to
DataFrame.filter()
, except it returns anIndex
object, and avoids any under-the-hood things that might be happening withDataFrame.filter()
in terms of views/copies of theDataFrame
. Some examples where this would be useful (and helpful when doing type checking).df.columns = [x for x in otherdf.columns if "at" in x]
, you can dodf.columns = otherdf.columns.filter(like="at")
df.drop(columns=[x for x in df.columns if x.startswith("b")]
, you can dodf.drop(columns=df.columns.filter(regex=r"b.*")
df.set_index([x for x in df.columns if x.endswith("z")])
, you can dodf.set_index(df.columns.filter(r".*z$"))