[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

glemaitre · 2019-06-06T20:37:35Z

Allows to index columns as well as rows using the code from ColumnTransformer.
Added some tests for the supported type.

This PR should simplify the review of #14028

NB: the functions have been moved and renamed but not modified.

thomasjpfan · 2019-06-07T13:43:10Z

Thank you for the PR! This would be useful for permutation importance too! #13146

glemaitre · 2019-06-07T14:15:42Z

@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;)
I still need to cover the raise but you can have a look at the naming and if you are fine with the naming

NicolasHug

A few comments but mostly LGTM.

_check_key_type isn't tested anywhere?

sklearn/utils/__init__.py

sklearn/utils/tests/test_utils.py

sklearn/utils/__init__.py

glemaitre · 2019-06-11T08:38:09Z

_check_key_type isn't tested anywhere?

This is tested through _get_column

NicolasHug · 2019-06-11T13:15:30Z

This is tested through _get_column

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

glemaitre · 2019-06-11T13:25:49Z

Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function.

Good point

thomasjpfan · 2019-06-11T21:51:02Z

Looks like the pandas tests are not being reported again to codecov again 🤔

jnothman · 2019-06-11T22:40:52Z

Is there any sense in just using safe_indexing(X.T, mask)?

glemaitre · 2019-06-12T09:14:37Z

Is there any sense in just using safe_indexing(X.T, mask)?

Do you mean to use the same indexing but with a transposed matrix in all cases?

My intent was to merge the code from the ColumnTransformer which is generic to get columns and can be reused in several places (PDP, permutation importance). So the supported types are different from the current safe_indexing for the rows.

sklearn/utils/__init__.py

jnothman · 2019-06-13T04:44:46Z

Sorry, I think I confused safe_mask with safe_indexing. This is good.

…

sklearn/utils/tests/test_utils.py

sklearn/utils/__init__.py

jnothman

I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.

sklearn/utils/__init__.py

NicolasHug

Minor comments + some potential bug (?)

Apart from that LGTM

sklearn/utils/__init__.py

sklearn/utils/tests/test_utils.py

NicolasHug

I would propose the following docstring which IMHO is clearer. LGTM anyway.

    Parameters
    ----------
    X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
        Data from which to sample rows, items or columns.
    indices : array-like
        - When ``axis=0``, indices need to be an array of integer.
        - When ``axis=1``, indices can be one of:
            - integer: output is 1D, unless `X` is sparse.
            - container: lists, slices, boolean masks: output is 2D.
              Supported data types for containers:
                - integer or boolean (positional): supported for
                  arrays, sparse matrices and dataframes
                - string (key-based): only supported for dataframes. No keys
                  other than strings are allowed.
    axis : int, default=0
        The axis along which `X` will be subsampled. ``axis=0`` will select
        rows while ``axis=1`` will select columns.

jnothman

With regards to your improved docstring, @NicolasHug, what about the scalar string?

NicolasHug · 2019-06-26T12:57:45Z

I understand that scalar is just another name for integer, in this context?

glemaitre · 2019-07-01T14:43:26Z

I updated the documentation with @NicolasHug proposal.

jnothman · 2019-07-01T22:01:06Z

Can we confirm the behaviour with a single string?

glemaitre · 2019-07-02T14:50:02Z

scalar string will return a pandas series

jnothman · 2019-07-02T21:58:47Z

Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre

jnothman · 2019-07-02T21:59:19Z

I should add that I'm okay with the most recent docstring update.

…ikit-learn#14035)

glemaitre added 2 commits June 6, 2019 22:35

EHN add parameter axis in safe_indexing to slice rows and columns

f4fd1ea

whats new

d4f1e85

glemaitre changed the title ~~EHN add parameter axis in safe_indexing to slice rows and columns~~ [MRG] EHN add parameter axis in safe_indexing to slice rows and columns Jun 6, 2019

NicolasHug mentioned this pull request Jun 7, 2019

[MRG] ENH Add support for dataframe in PDP #14028

Merged

thomasjpfan self-requested a review June 7, 2019 13:41

NicolasHug reviewed Jun 7, 2019

View reviewed changes

thomasjpfan reviewed Jun 9, 2019

View reviewed changes

sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Show resolved Hide resolved

glemaitre added 3 commits June 11, 2019 10:56

address some comments

190d83a

address some comments

ef8c405

comments

050b344

glemaitre added 2 commits June 11, 2019 15:30

iter

17d55b1

iter

ae89f9a

check value of axis param

1fde74a

NicolasHug reviewed Jun 12, 2019

View reviewed changes

sklearn/utils/__init__.py Show resolved Hide resolved

address comments nicolas

bd83a75

thomasjpfan approved these changes Jun 13, 2019

View reviewed changes

sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved

iter

7e2392d

NicolasHug reviewed Jun 14, 2019

View reviewed changes

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

jnothman reviewed Jun 18, 2019

View reviewed changes

sklearn/utils/__init__.py Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

glemaitre added 2 commits June 20, 2019 11:44

fix docstring

cc5a1a4

fix doc

9950972

glemaitre added 2 commits June 20, 2019 12:03

remove callable

0b4cccb

fix pattern test

5e4216b

NicolasHug reviewed Jun 21, 2019

View reviewed changes

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Show resolved Hide resolved

sklearn/utils/tests/test_utils.py Outdated Show resolved Hide resolved

iter

461d25c

NicolasHug approved these changes Jun 24, 2019

View reviewed changes

jnothman reviewed Jun 26, 2019

View reviewed changes

address nicolas comments

f939379

Merge remote-tracking branch 'origin/master' into extend_safe_indexing

af68686

update docstring regarding scalar string support

9223138

jnothman merged commit 5f37ec1 into scikit-learn:master Jul 2, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH add parameter axis in safe_indexing to slice rows and columns (sc…

c3cb11b

…ikit-learn#14035)

glemaitre mentioned this pull request Jul 25, 2019

[MRG] EHN add support for scalar, slice and mask in safe_indexing axis=0 #14475

Merged

avm19 mentioned this pull request Mar 10, 2022

ColumnTransformer: integer column index in dataframes unexpected behaviour and error (column selectors vs _get_column_indices) #22556

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

glemaitre commented Jun 6, 2019 •

edited

Loading

thomasjpfan commented Jun 7, 2019

glemaitre commented Jun 7, 2019

NicolasHug left a comment

glemaitre commented Jun 11, 2019

NicolasHug commented Jun 11, 2019

glemaitre commented Jun 11, 2019

thomasjpfan commented Jun 11, 2019

jnothman commented Jun 11, 2019 via email

glemaitre commented Jun 12, 2019

jnothman commented Jun 13, 2019 via email

jnothman left a comment

NicolasHug left a comment •

edited

Loading

NicolasHug left a comment

jnothman left a comment

NicolasHug commented Jun 26, 2019

glemaitre commented Jul 1, 2019

jnothman commented Jul 1, 2019

glemaitre commented Jul 2, 2019

jnothman commented Jul 2, 2019

jnothman commented Jul 2, 2019

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

Conversation

glemaitre commented Jun 6, 2019 • edited Loading

thomasjpfan commented Jun 7, 2019

glemaitre commented Jun 7, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

glemaitre commented Jun 11, 2019

NicolasHug commented Jun 11, 2019

glemaitre commented Jun 11, 2019

thomasjpfan commented Jun 11, 2019

jnothman commented Jun 11, 2019 via email

glemaitre commented Jun 12, 2019

jnothman commented Jun 13, 2019 via email

jnothman left a comment

Choose a reason for hiding this comment

NicolasHug left a comment • edited Loading

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

NicolasHug commented Jun 26, 2019

glemaitre commented Jul 1, 2019

jnothman commented Jul 1, 2019

glemaitre commented Jul 2, 2019

jnothman commented Jul 2, 2019

jnothman commented Jul 2, 2019

glemaitre commented Jun 6, 2019 •

edited

Loading

NicolasHug left a comment •

edited

Loading