Skip to content

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 2, 2019

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jun 6, 2019

Allows to index columns as well as rows using the code from ColumnTransformer.
Added some tests for the supported type.

This PR should simplify the review of #14028

NB: the functions have been moved and renamed but not modified.

@glemaitre glemaitre changed the title EHN add parameter axis in safe_indexing to slice rows and columns [MRG] EHN add parameter axis in safe_indexing to slice rows and columns Jun 6, 2019
@thomasjpfan thomasjpfan self-requested a review June 7, 2019 13:41
@thomasjpfan
Copy link
Member

Thank you for the PR! This would be useful for permutation importance too! #13146

@glemaitre
Copy link
Member Author

@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;)
I still need to cover the raise but you can have a look at the naming and if you are fine with the naming

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but mostly LGTM.

_check_key_type isn't tested anywhere?

@glemaitre
Copy link
Member Author

_check_key_type isn't tested anywhere?

This is tested through _get_column

@NicolasHug
Copy link
Member

This is tested through _get_column

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

@glemaitre
Copy link
Member Author

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

Good point

@thomasjpfan
Copy link
Member

Looks like the pandas tests are not being reported again to codecov again 🤔

@jnothman
Copy link
Member

jnothman commented Jun 11, 2019 via email

@glemaitre
Copy link
Member Author

Is there any sense in just using safe_indexing(X.T, mask)?

Do you mean to use the same indexing but with a transposed matrix in all cases?

My intent was to merge the code from the ColumnTransformer which is generic to get columns and can be reused in several places (PDP, permutation importance). So the supported types are different from the current safe_indexing for the rows.

@jnothman
Copy link
Member

jnothman commented Jun 13, 2019 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments + some potential bug (?)

Apart from that LGTM

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose the following docstring which IMHO is clearer. LGTM anyway.

    Parameters
    ----------
    X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
        Data from which to sample rows, items or columns.
    indices : array-like
        - When ``axis=0``, indices need to be an array of integer.
        - When ``axis=1``, indices can be one of:
            - integer: output is 1D, unless `X` is sparse.
            - container: lists, slices, boolean masks: output is 2D.
              Supported data types for containers:
                - integer or boolean (positional): supported for
                  arrays, sparse matrices and dataframes
                - string (key-based): only supported for dataframes. No keys
                  other than strings are allowed.
    axis : int, default=0
        The axis along which `X` will be subsampled. ``axis=0`` will select
        rows while ``axis=1`` will select columns.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to your improved docstring, @NicolasHug, what about the scalar string?

@NicolasHug
Copy link
Member

I understand that scalar is just another name for integer, in this context?

@glemaitre
Copy link
Member Author

I updated the documentation with @NicolasHug proposal.

@jnothman
Copy link
Member

jnothman commented Jul 1, 2019

Can we confirm the behaviour with a single string?

@glemaitre
Copy link
Member Author

scalar string will return a pandas series

@jnothman
Copy link
Member

jnothman commented Jul 2, 2019

Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre

@jnothman
Copy link
Member

jnothman commented Jul 2, 2019

I should add that I'm okay with the most recent docstring update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants