[MRG] Allow scoring of dummies without testsamples #11957

JarnoRFB · 2018-08-31T13:45:19Z

As DummyClassifier and DummyRegressor operate solely on the targets,
they can now be used without passing test samples, instead passing None.
Also includes some minor renaming in the corresponding tests for more
consistency.

Reference Issues/PRs

Resolves #11951

As DummyClassifier and DummyRegressor operate solely on the targets, they can now be used without passing test samples, instead passing None. Also includes some minor renaming in the corresponding tests for more consistency.

JarnoRFB · 2018-08-31T13:46:06Z

Not sure if the functions ensure_consistent_length is in the right place there, or if it should go in some utils module.

sklearn/dummy.py

sklearn/tests/test_dummy.py

JarnoRFB · 2018-09-02T20:10:51Z

Thanks for the feedback! I tried to incorporate the requested changes.

jnothman

Almost there, thanks!

jnothman · 2018-09-02T23:18:58Z

sklearn/dummy.py

+        return super(DummyRegressor, self).score(X, y, sample_weight)
+
+
+def _ensure_consistent_length(X, y):


For readability, I think you should just inline this code, and not use a separate function.

Well, I thought that using a function with a speaking name would actually increase readability and avoid duplication, as the functionality is needed for both the classifier und the regressor. Maybe you can elaborate on how inlining the code would improve readability. I can of course change that, if you want.

A function name can improve readability, but I don't think that the meaning of _ensure_consistent_length is obvious enough to do that. Perhaps _handle_no_x would be clearer.

I renamed the function, let me know what you think.

sklearn/tests/test_dummy.py

sklearn/dummy.py

JarnoRFB · 2018-09-05T14:48:53Z

I renamed the function, let me know what you think. If you still don't like it I can inline the code.

jnothman

I still suspect it would be clearer inline.

jnothman · 2018-09-06T00:41:42Z

sklearn/dummy.py

+            samples used in the fitting for the estimator.
+            Passing None as test samples gives the same result
+            as passing real test samples, since DummyRegressor
+            operates solely on the targets.


operates independent of the sampled observations.

jnothman · 2018-09-06T11:13:50Z

Please add an entry to the change log at doc/whats_new/v0.21.rst. Please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

albertcthomas

If it is assumed that Dummy estimator predictions do not depend on X, do we need to add something to ensure that someone will not add a strategy that will use some information from X (in the doc or using a test)?

sklearn/dummy.py

JarnoRFB · 2018-09-10T09:35:26Z

@albertcthomas Thanks for the review! in the model evaluation docs in the dummy-estimator section http://scikit-learn.org/stable/modules/model_evaluation.html#dummy-estimators it says
"Note that with all these strategies, the predict method completely ignores the input data!".
So there is something in the docs. One could of course add tests for every dummy estimator and every strategy, so the estimator ones fitted produces the same prediction for different X.

The definition of a dummy estimator as predicting the label based only on the label distribution and independent of the features makes a lot of sense to me, as this is so to say the best you can get when your features have absolutely no predictive power.

I could add some test if you think it makes sense.

JarnoRFB · 2018-09-10T10:04:12Z

Ok, I added some tests for checking the independence of X. If someone would implement a strategy that does not operate independently of X, they would of course have to add their strategy to the test to see it fail. But I guess that should make it a bit more obvious that this is not desired.

albertcthomas · 2018-09-10T14:42:24Z

The definition of a dummy estimator as predicting the label based only on the label distribution and independent of the features makes a lot of sense to me, as this is so to say the best you can get when your features have absolutely no predictive power.

I agree. My point is just that if a contributor or a user adds a dummy strategy that depends on X then returning the score independently of X would be wrong and this could fail silently.

JarnoRFB · 2018-09-10T20:29:14Z

I agree. My point is just that if a contributor or a user adds a dummy strategy that depends on X then returning the score independently of X would be wrong and this could fail silently.

One possibility would be to go from

X = np.zeros(shape=(len(y), 1))

back to

X = [[None]] * len(y)

so when some computation on the made up X is attempted, the operation would fail, if predict actually uses the values from X. Additionally, one could add even more explicit documentation. What do you think?

albertcthomas

I think we can stick with what you already implemented and the tests you added. If a user wants to pass X=None in clf.score it is clearly mentionned in the docstring that Dummy estimator operate independently of X. Thanks @JarnoRFB !

albertcthomas · 2018-09-11T08:50:15Z

To be completely safe, another possibility would be to add a check in clf.score such as

if X is None and strategy not in {current strategies}:
  raise ValueError('informative message')

JarnoRFB · 2018-09-11T09:10:52Z

To be completely safe, another possibility would be to add a check in clf.score such as

if X is None and strategy not in {current strategies}:
    raise ValueError('informative message')

Not really sure about that. A similar check is already performed in clf.fit. So if someone would implement a new strategy, they would have to add their strategy to this check anyway. Therefore, they would also add it to to this check if we put it in the code.

Adding the a second strategy check to clf.score would of course give the possibility to have some strategies that work independent of X and some that work not independent of X. However, I think it would make more sense to clearly define a DummyEstimator as an estimator that ignores X and not even introduce the possibility to implement something else.

jnothman · 2018-09-12T10:07:07Z

Let's sneak this into 0.20.
Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

JarnoRFB · 2018-09-12T16:28:05Z

@jnothman I moved the entry to doc/whats_new/v0.20.rst

jnothman · 2018-09-13T08:18:35Z

Thanks @JarnoRFB!

JarnoRFB · 2018-09-13T08:44:30Z

Thanks for guiding me through this. Very happy to contribute!

[MGR] Allow scoring of dummies without testsamples

648fdf0

As DummyClassifier and DummyRegressor operate solely on the targets, they can now be used without passing test samples, instead passing None. Also includes some minor renaming in the corresponding tests for more consistency.

jnothman reviewed Sep 1, 2018

View reviewed changes

sklearn/dummy.py Outdated Show resolved Hide resolved

jnothman reviewed Sep 1, 2018

View reviewed changes

sklearn/tests/test_dummy.py Outdated Show resolved Hide resolved

JarnoRFB added 2 commits September 2, 2018 22:08

Remove renaming and use np to make fake samples

be9005e

Merge branch 'master' into dummies-without-testsamples

b19eb91

Add newline at end of file

16f5493

jnothman reviewed Sep 2, 2018

View reviewed changes

Use parametrized test to avoid duplication

f2fd986

JarnoRFB commented Sep 3, 2018

View reviewed changes

sklearn/dummy.py Outdated Show resolved Hide resolved

JarnoRFB added 3 commits September 3, 2018 23:16

Remove blank line

14aeab8

Use more obvious name for handling missing X

6c2d41e

Merge branch 'master' into dummies-without-testsamples

538bdff

jnothman reviewed Sep 6, 2018

View reviewed changes

jnothman approved these changes Sep 6, 2018

View reviewed changes

Inline function and change docstring

ba18a14

jnothman approved these changes Sep 6, 2018

View reviewed changes

Add entry to whats new

43daddf

albertcthomas reviewed Sep 7, 2018

View reviewed changes

sklearn/dummy.py Outdated Show resolved Hide resolved

sklearn/dummy.py Outdated Show resolved Hide resolved

Fix typo

16af88a

Add tests for dummies being independent of X

3b7ba76

albertcthomas approved these changes Sep 11, 2018

View reviewed changes

Move whats new entry to v0.20

ebedf42

jnothman merged commit 3ee1cfc into scikit-learn:master Sep 13, 2018

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Sep 17, 2018

ENH Allow scoring of dummies without testsamples (scikit-learn#11957)

c7ff8c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Allow scoring of dummies without testsamples #11957

[MRG] Allow scoring of dummies without testsamples #11957

JarnoRFB commented Aug 31, 2018

JarnoRFB commented Aug 31, 2018

JarnoRFB commented Sep 2, 2018

jnothman left a comment

jnothman Sep 2, 2018

JarnoRFB Sep 3, 2018

jnothman Sep 3, 2018

JarnoRFB Sep 4, 2018

JarnoRFB commented Sep 5, 2018

jnothman left a comment

jnothman Sep 6, 2018

jnothman commented Sep 6, 2018

albertcthomas left a comment •

edited

Loading

JarnoRFB commented Sep 10, 2018

JarnoRFB commented Sep 10, 2018

albertcthomas commented Sep 10, 2018

JarnoRFB commented Sep 10, 2018

albertcthomas left a comment

albertcthomas commented Sep 11, 2018

JarnoRFB commented Sep 11, 2018

jnothman commented Sep 12, 2018

JarnoRFB commented Sep 12, 2018

jnothman commented Sep 13, 2018

JarnoRFB commented Sep 13, 2018

		return super(DummyRegressor, self).score(X, y, sample_weight)


		def _ensure_consistent_length(X, y):

[MRG] Allow scoring of dummies without testsamples #11957

[MRG] Allow scoring of dummies without testsamples #11957

Conversation

JarnoRFB commented Aug 31, 2018

Reference Issues/PRs

JarnoRFB commented Aug 31, 2018

JarnoRFB commented Sep 2, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman Sep 2, 2018

Choose a reason for hiding this comment

JarnoRFB Sep 3, 2018

Choose a reason for hiding this comment

jnothman Sep 3, 2018

Choose a reason for hiding this comment

JarnoRFB Sep 4, 2018

Choose a reason for hiding this comment

JarnoRFB commented Sep 5, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman Sep 6, 2018

Choose a reason for hiding this comment

jnothman commented Sep 6, 2018

albertcthomas left a comment • edited Loading

Choose a reason for hiding this comment

JarnoRFB commented Sep 10, 2018

JarnoRFB commented Sep 10, 2018

albertcthomas commented Sep 10, 2018

JarnoRFB commented Sep 10, 2018

albertcthomas left a comment

Choose a reason for hiding this comment

albertcthomas commented Sep 11, 2018

JarnoRFB commented Sep 11, 2018

jnothman commented Sep 12, 2018

JarnoRFB commented Sep 12, 2018

jnothman commented Sep 13, 2018

JarnoRFB commented Sep 13, 2018

albertcthomas left a comment •

edited

Loading