Skip to content

[MRG] Clarified indempotence of fit #12305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 16, 2018

Conversation

NicolasHug
Copy link
Member

Reference Issues/PRs

What does this implement/fix? Explain your changes.

As far as I understand it, the fact that fit is idempotent means that repeated calls to fit with the same data doesn't change the estimator.

The contributing guide was a bit unclear about this.

Any other comments?

@NicolasHug
Copy link
Member Author

Also, it would make more sense to me if those 2 facts were in the Fitting section rather than were they currently are:

  • fit doesn't care about previous calls
  • fit is idempotent

@amueller
Copy link
Member

amueller commented Oct 5, 2018

unless random_state=None in which case it will use the global random state and therefore have side-effects, or random_state=ARandomStateObject in which case this object will be consumed. But random_state=seed will mean the results are identical.

Btw, it would be great if you could add tests to the common tests that explicitly test for this, as it has come up twice at least.

@amueller
Copy link
Member

amueller commented Oct 5, 2018

Even stronger, est.fit(Anything).fit(X) needs to be equivalent to est.fit(X)

@amueller
Copy link
Member

amueller commented Oct 5, 2018

btw the mutation of the random state stuff should go into the glossary and probably also into the dev guide. It's pretty subtle and I don't think it's written down somewhere.

@NicolasHug
Copy link
Member Author

NicolasHug commented Oct 5, 2018

Even stronger, est.fit(Anything).fit(X) needs to be equivalent to est.fit(X)

Yes this actually covers all cases and I think this is what was intended initially.

After all it's not as much about idempotence than it is about some kind of 'statelessness'. I'll update the doc and submit tests in another PR.

mutation of the random state stuff should go into the glossary

You mean updating this? It's pretty clearn IMHO, and it mentions the fact that fit may have different results when not set to an int

@amueller
Copy link
Member

amueller commented Oct 5, 2018

Oh this looks actually great. I guess I didn't realize all of the detail @jnothman put into the glossary ;) You should link to it, though.

@amueller
Copy link
Member

amueller commented Oct 5, 2018

See #12267 and #10978 for examples of failing tests

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a complement to document the exception when warm_start=True. Other than that, LGTM.

@jnothman
Copy link
Member

jnothman commented Oct 8, 2018

Thanks, @NicolasHug!

@amueller amueller merged commit bea1eb5 into scikit-learn:master Oct 16, 2018
@NicolasHug NicolasHug deleted the contributing_update branch October 17, 2018 20:28
anuragkapale pushed a commit to anuragkapale/scikit-learn that referenced this pull request Oct 23, 2018
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->


#### What does this implement/fix? Explain your changes.

As far as I understand it, the fact that `fit` is idempotent means that repeated calls to `fit` with the same data doesn't change the estimator.

The contributing guide was a bit unclear about this.

#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->


#### What does this implement/fix? Explain your changes.

As far as I understand it, the fact that `fit` is idempotent means that repeated calls to `fit` with the same data doesn't change the estimator.

The contributing guide was a bit unclear about this.

#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->


#### What does this implement/fix? Explain your changes.

As far as I understand it, the fact that `fit` is idempotent means that repeated calls to `fit` with the same data doesn't change the estimator.

The contributing guide was a bit unclear about this.

#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist
-->

#### Reference Issues/PRs
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull requests
you resolved, so that they will automatically be closed when your pull request
is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests
-->


#### What does this implement/fix? Explain your changes.

As far as I understand it, the fact that `fit` is idempotent means that repeated calls to `fit` with the same data doesn't change the estimator.

The contributing guide was a bit unclear about this.

#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:
http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants