Skip to content

ENH allow extra params to be copied in clone #20681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

adrinjalali
Copy link
Member

Fixes #20585

This PR proposes a new tag: "extra_params" (open to better names), which would indicate which non __init__ parameters should be copied in clone.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be called non_init_params?

@@ -628,6 +628,13 @@ class NonConformantEstimatorNoParamSet(BaseEstimator):
def __init__(self, you_should_set_this_=None):
pass

class ConformantEstimatorWithExtraParams(BaseEstimator):
def _more_tags(self):
return {"extra_params": ["param1"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first thought: this is hard to extend with inheritance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we go through MRO and handle extending attributes? I think @amueller wasn't very happy about a manual MRO walk. Also, WDYT @rth ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @amueller wasn't very happy about a manual MRO walk

I am not sure that you can otherwise if one wants to inherit extra_params here.
The problem spotted here would also be true for other tags that would accept an iterable, e.g. preserve_dtypes and X_types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, that's not an issue which we'd like to solve in this PR I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it's an issue for any tag with a collection for a value, but I'm not sure that those other examples would warrant the use of inheritance to extend them as much as the present case, since "what gets cloned" is likely to be the sort of thing added to by a mixin. One solution is to use a prefixing system with scalar values rather than a collection value. But I'm okay leaving this concern alone, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's not clear to me if this would be used anywhere else than for '_metadata_request' in practice,

Yep, at least inside sklearn, I don't see us adding anything else anytime soon, or too frequently at all. And however it's used by external libraries, I'm sure they can find a workaround to the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And however it's used by external libraries, I'm sure they can find a workaround to the issue.

I think we need to at least provide some guidance and what this work around looks like.

We can special case non_init_params here:

more_tags = base_class._more_tags(self)
collected_tags.update(more_tags)

and have it actual extend the list.

Are there any use cases we see for non_init_params besides _metadata_request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can special case non_init_params here:

It'll make that part of the code a bit dirty, but I won't oppose. Should it be included in this PR?

I think we need to at least provide some guidance and what this work around looks like.

Since the workaround would apply to all lists, should we have it in a different PR?

Are there any use cases we see for non_init_params besides _metadata_request?

As far as I'm concerned, metadata_request is the only one, but I guess third party developers can get creative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll make that part of the code a bit dirty, but I won't oppose. Should it be included in this PR?

I am thinking if that is the workaround third parties would have to do.

I guess third party developers can get creative.

I am starting to see it as a hack to get freezing to work. Imagine listing out all the attributes you want to "freeze" in non_init_params.

As far as I'm concerned, metadata_request is the only one

In #20350, was there any reason for a developer to know that _metadata_request exist and need to interact with metadata_request directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if that is the workaround third parties would have to do.

The workaround would be for them to get the tags in _more_tags, and make sure the extend it, rather than returning only new ones, which is not a bad hack.

I am starting to see it as a hack to get freezing to work. Imagine listing out all the attributes you want to "freeze" in non_init_params.

It'll probably involve more than that for meta-estimators, but I like the idea.

In #20350, was there any reason for a developer to know that _metadata_request exist and need to interact with metadata_request directly?

I don't think so, they pretty much would work with the helper functions and classes.

@adrinjalali
Copy link
Member Author

Are we happy with this now?

@thomasjpfan
Copy link
Member

Are we happy with this now?

I do like this being in tags, but I am starting to feel uneasy about tags after the __sklearn_is_fitted__ PR. It becomes more apparent here since a third party estimator developer can not extend non_init_params without us changing the logic in BaseEstimator._get_tags. Opened #20804 to discuss it.

I do not think need to rush this PR because the metadata_request is targeted for 1.1?

@adrinjalali
Copy link
Member Author

@thomasjpfan this PR came about to be since I was cleaning up the sample props PR and getting it ready for a second vote. We'll need another month once that PR is finished to vote on it, and probably some more discussion time, and that process is blocked by this PR. So I'd say we're already behind the schedule for sample props to get in 1.1

I'm also not sure if we need to wait for the extendability of the lists in estimator tags to be fixed before we merge this. That issue is orthogonal to this PR, and can be resolved after this PR (also #20681 (comment)). So I'd be happy if we don't block this PR by #20804.

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 25, 2021

that process is blocked by this PR. So I'd say we're already behind the schedule for sample props to get in 1.1

@adrinjalali Do you think we can make progress on #20350 with hardcoding _metadata_request and treat _metadata_request as an implementation detail? Developers and users are not meant to interact with _metadata_request. When this PR gets merged, we can switch out the hard coding with the new tag.

I do not want to rush out this PR because of how it is using tags to change the behavior of a public function (clone). If a developer overrides non_init_params and does not extend, they would break sample props, which feels very brittle. This makes me think we should continue to hardcode _metadata_request.

@adrinjalali
Copy link
Member Author

Resolved by #24568

@adrinjalali adrinjalali deleted the private-clone branch March 14, 2023 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC Private attributes copied in clone
5 participants