Skip to content

Sample properties support in transform methods with **transform_params #15370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hermidalc
Copy link
Contributor

@hermidalc hermidalc commented Oct 26, 2019

Reference Issues/PRs

This is relevant to both SLEP006 and #9566

What does this implement/fix? Explain your changes.

Passing of sample properties to transform methods via **transform_params and proper handling in Pipeline and Grid/RandomizedSearchCV cross-validation and scoring.

Any other comments?

This is intended to be exploratory and to help experiment with core devs on what designs might work best or not.

@hermidalc
Copy link
Contributor Author

hermidalc commented Oct 26, 2019

I know a number of tests do not pass, just wanted to first see what would fail in the sklearn test suite. I know a lot of the failures are TypeError: transform() got an unexpected keyword argument... fails which, with this first exploratory implementation, is due to when a transformer supports a particular fit param, e.g:

def fit(self, X, y, a=None):

then it also has to be defined in:

def transform(self, X, a=None):

which is I know is missing in all such transformers in the codebase.

@hermidalc hermidalc changed the title Test sample properties support with **transform_params Sample properties support to transform methods with **transform_params Oct 26, 2019
@hermidalc hermidalc changed the title Sample properties support to transform methods with **transform_params Sample properties support in transform methods with **transform_params Oct 27, 2019
@hermidalc hermidalc mentioned this pull request Oct 27, 2019
11 tasks
@hermidalc hermidalc closed this Oct 31, 2019
@hermidalc hermidalc deleted the test-metadata-transform-params branch October 31, 2019 13:38
@jnothman
Copy link
Member

Thanks for doing this @hermidalc and sorry I've not had time to engage in it, especially since we are focused on a release

@hermidalc
Copy link
Contributor Author

Thanks for doing this @hermidalc and sorry I've not had time to engage in it, especially since we are focused on a release

No worries, I actually closed the branch because I created a new branch where I took the code you did with the property routing from #9566 merged with what I did so that routes properties to train and test transform, scoring, and predict* and decision_function etc.. It works without issues within Pipeline and *SearchCV with my initial tests but of course there are multiple sklearn test suite failures.

From what I understand the property routing design (solution 3) isn't what you envision now, you prefer solution 4? It would make it cleaner. I needed something now that would pass and subset sample properties to fit and transform as well as sample_weight to scorers with Pipeline and *SearchCV and really had no idea how to start with solution 4.

I'll make a PR with the new branch in a bit so you can see.

@jnothman
Copy link
Member

jnothman commented Oct 31, 2019 via email

@hermidalc
Copy link
Contributor Author

Thanks. Cool that you're actually using one of these hacks! How do you deal with fit_transform?

I didn't do anything special, properties get routed to both fit and transform. Then in my custom extensions I specify the property in both fit and transform parameter lists even if I don't need it in transform (where I set default =None). I haven't done extensive testing but seems like with your property routing it doesn't affect the sklearn API, unlike my first implementation where it would break anything in the API that had a fit parameter that wasn't also in transform. So only in my custom extensions where I am passing a sample_prop kwarg which is a pandas DataFrame do I specify the routing and it's fine because I simply also specify the parameter in transform even if not used.

One can envision an enhanced property routing syntax for Pipeline where you can specify { 'dest1': {'fit': 'prop'}, 'dest2': 'prop', 'dest3': {'transform': 'prop'}} and get back fit_props and transform_props separately and then change the Pipeline code to not use fit_transform so that fit and transform could get the right properties.

But I like your solution 4 where children request properties in their classes and then there won't be and prop_routing configuration in user code. Right now I needed to write up something quickly that would work for a project and would have to think more about how your solution 4 would be implemented.

@hermidalc
Copy link
Contributor Author

@jnothman here's my exploratory branch here #15425

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants