Skip to content

Conversation

Micky774
Copy link
Contributor

Reference Issues/PRs

Addresses #21308

What does this implement/fix? Explain your changes.

Implements the get_feature_names_out method to the neighbors module

Any other comments?

This is a work in progress

Initial implementation of `get_feature_names_out`
@Micky774
Copy link
Contributor Author

Micky774 commented Jan 14, 2022

Since both transformers map from a collection of feature vectors to a distance graph, the "output features" would just be the distance to each point of the fitted data right? This would be very verbose and potentially unwieldy for what I imagine the intended use of get_feature_names_out.

So the two obvious (not necessarily great) solutions to this I see are:

  • Make it so that each output feature name is the distance to point in the fitted data (e.g. output features = columns of distance graph).
  • Make it so there is a single output feature, which is just the entire distance graph.

Part of me thinks the more "romantic" solution would be to have n_neighbors -many output features, where each describes the (sorted) n_neighbors many non-zero distances but I don't really think that makes sense given that the output of the transformers is a distance graph.

I'm leaning towards the first option (output features = columns of distance graph).

@ogrisel
Copy link
Member

ogrisel commented Jan 14, 2022

For KNeighborsTransformer, n_neighbors is fixed a priori. That should answer your comment above.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

- Moved `_ClassNamePrefixFeaturesOutMixin` to specific transformers
- Added `_ClassNamePrefixFeaturesOutMixin` to NCA
- Added `self._n_features_out` to NCA
@Micky774
Copy link
Contributor Author

Micky774 commented Jan 14, 2022

For KNeighborsTransformer, n_neighbors is fixed a priori. That should answer your comment above.

I guess my question is more-so: since the outputs of KNeighborsTransformer and RadiusNeighborsTransformer are distance graphs, is it sufficient to treat them as returning n_samples_fit_ output features, and to then assign them class-prefixed feature names, as does my current implementation?

@thomasjpfan
Copy link
Member

I guess my question is more-so: since the outputs of KNeighborsTransformer and RadiusNeighborsTransformer are distance graphs, is it sufficient to treat them as returning n_samples_fit_ output features, and to then assign them class-prefixed feature names, as does my current implementation?

Yup, the number of feature names out must match the amount of output features from transform.

@Micky774 Micky774 changed the title [WIP] ENH Adds get_feature_names_out to neighbors module ENH Adds get_feature_names_out to neighbors module Jan 14, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/scikit-learn/scikit-learn/pull/22212/files#r785147042 still needs to be addressed. Once done, I think this PR LGTM.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Micky774 !

We need tests to check the actual names of get_feature_names_out. See:

def test_pls_feature_names_out(Klass):
"""Check `get_feature_names_out` cross_decomposition module."""

@Micky774
Copy link
Contributor Author

Added in the unit tests! Thank you for all your active feedback, and for your patience in this. Let me know if there's anything else I should add here!

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the main branch to resolve the conflict in the changelog. I also made this an enhancement instead of an API change because those features names method did not exist under a different name in the last released scikit-learn version.

One more code style detail, but otherwise, LGTM.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to approve this PR in previous review, instead of just commenting.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title ENH Adds get_feature_names_out to neighbors module ENH Adds get_feature_names_out to neighbors module Jan 20, 2022
@thomasjpfan thomasjpfan merged commit 330881a into scikit-learn:main Jan 20, 2022
@Micky774 Micky774 deleted the feature_names_neighbors branch February 22, 2022 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants