-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Allow scoring of dummies without testsamples #11957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Allow scoring of dummies without testsamples #11957
Conversation
As DummyClassifier and DummyRegressor operate solely on the targets, they can now be used without passing test samples, instead passing None. Also includes some minor renaming in the corresponding tests for more consistency.
Not sure if the functions |
Thanks for the feedback! I tried to incorporate the requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, thanks!
sklearn/dummy.py
Outdated
return super(DummyRegressor, self).score(X, y, sample_weight) | ||
|
||
|
||
def _ensure_consistent_length(X, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, I think you should just inline this code, and not use a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I thought that using a function with a speaking name would actually increase readability and avoid duplication, as the functionality is needed for both the classifier und the regressor. Maybe you can elaborate on how inlining the code would improve readability. I can of course change that, if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function name can improve readability, but I don't think that the meaning of _ensure_consistent_length
is obvious enough to do that. Perhaps _handle_no_x
would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the function, let me know what you think.
I renamed the function, let me know what you think. If you still don't like it I can inline the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still suspect it would be clearer inline.
sklearn/dummy.py
Outdated
samples used in the fitting for the estimator. | ||
Passing None as test samples gives the same result | ||
as passing real test samples, since DummyRegressor | ||
operates solely on the targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operates independent of the sampled observations.
Please add an entry to the change log at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is assumed that Dummy estimator predictions do not depend on X, do we need to add something to ensure that someone will not add a strategy that will use some information from X (in the doc or using a test)?
@albertcthomas Thanks for the review! in the model evaluation docs in the dummy-estimator section http://scikit-learn.org/stable/modules/model_evaluation.html#dummy-estimators it says The definition of a dummy estimator as predicting the label based only on the label distribution and independent of the features makes a lot of sense to me, as this is so to say the best you can get when your features have absolutely no predictive power. I could add some test if you think it makes sense. |
Ok, I added some tests for checking the independence of X. If someone would implement a strategy that does not operate independently of X, they would of course have to add their strategy to the test to see it fail. But I guess that should make it a bit more obvious that this is not desired. |
I agree. My point is just that if a contributor or a user adds a dummy strategy that depends on X then returning the score independently of X would be wrong and this could fail silently. |
One possibility would be to go from
back to
so when some computation on the made up X is attempted, the operation would fail, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stick with what you already implemented and the tests you added. If a user wants to pass X=None
in clf.score
it is clearly mentionned in the docstring that Dummy estimator operate independently of X. Thanks @JarnoRFB !
To be completely safe, another possibility would be to add a check in if X is None and strategy not in {current strategies}:
raise ValueError('informative message') |
Not really sure about that. A similar check is already performed in Adding the a second strategy check to |
Let's sneak this into 0.20. |
@jnothman I moved the entry to |
Thanks @JarnoRFB! |
Thanks for guiding me through this. Very happy to contribute! |
As DummyClassifier and DummyRegressor operate solely on the targets,
they can now be used without passing test samples, instead passing None.
Also includes some minor renaming in the corresponding tests for more
consistency.
Reference Issues/PRs
Resolves #11951