-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for the PR! This would be useful for permutation importance too! #13146 |
@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but mostly LGTM.
_check_key_type
isn't tested anywhere?
This is tested through |
Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function. |
Good point |
Looks like the pandas tests are not being reported again to codecov again 🤔 |
Is there any sense in just using safe_indexing(X.T, mask)?
|
Do you mean to use the same indexing but with a transposed matrix in all cases? My intent was to merge the code from the |
Sorry, I think I confused safe_mask with safe_indexing. This is good.
… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments + some potential bug (?)
Apart from that LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose the following docstring which IMHO is clearer. LGTM anyway.
Parameters
----------
X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
Data from which to sample rows, items or columns.
indices : array-like
- When ``axis=0``, indices need to be an array of integer.
- When ``axis=1``, indices can be one of:
- integer: output is 1D, unless `X` is sparse.
- container: lists, slices, boolean masks: output is 2D.
Supported data types for containers:
- integer or boolean (positional): supported for
arrays, sparse matrices and dataframes
- string (key-based): only supported for dataframes. No keys
other than strings are allowed.
axis : int, default=0
The axis along which `X` will be subsampled. ``axis=0`` will select
rows while ``axis=1`` will select columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to your improved docstring, @NicolasHug, what about the scalar string?
I understand that scalar is just another name for integer, in this context? |
I updated the documentation with @NicolasHug proposal. |
Can we confirm the behaviour with a single string? |
scalar string will return a pandas series |
Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre |
I should add that I'm okay with the most recent docstring update. |
Allows to index columns as well as rows using the code from
ColumnTransformer
.Added some tests for the supported type.
This PR should simplify the review of #14028
NB: the functions have been moved and renamed but not modified.