-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Remove check_array for X in DummyRegressor/DummyClassifier and replace X.shape[0] with _num_samples #9835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test.
Perhaps just discard the check_array and use utils.validation._num_samples
. Note sure if this will pass the common tests, though.
I haven't made tests before. Would adding the code to reproduce the bug from the report to sklearn/tests/test_dummy.py in the form of function suffice? |
ideally you would do that, but with a numpy array rather than a pandas
DataFrame.
|
sklearn/tests/test_dummy.py
Outdated
|
||
|
||
def test_dummy_regressor_on_object_value(): | ||
X = np.array(['foo', 'bar', 'baz']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't actually have dtype object (check for yourself) so the test is incorrectly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I'll change it to string.
Changed test name from test_dummy_regressor_on_object_value to test_dummy_regressor_on_string_values.
I have changed the test name. @jnothman is there anything else that needs to be changed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ping @vrishank97 Some personal concerns: X = np.array(['a', 'b', 'c']).reshape(-1, 1)
check_array(X, dtype='numeric') # no error (3) Seems that the original code snippet in the issue will still get an error. |
So seems that there's something wrong with check_array (#10229). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @vrishank97 Could you please take some time to finish it. Otherwise I'll pick it up.
Even with current check_array, I believe we can still construct a regression test using astype("O").
Hi. Some work came up earlier. I'll continue my work on this now. Sorry for the delay. |
We should probably check with a zero feature array as well, right? |
@qinhanmin2014 |
@vrishank97 |
sklearn/tests/test_dummy.py
Outdated
y_pred = cls.predict(X) | ||
assert_array_equal(y_pred, y_expected) | ||
|
||
test_dummy_regressor_on_string_values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test dummy classifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made the test. Pushing now
@vrishank97 Please also update what's new(docs/whats_new/v0.20.rst) accordingly. |
I've updated the what's new section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrishank97 Please avoid making unrelated changes.
doc/whats_new/v0.20.rst
Outdated
@@ -116,6 +116,10 @@ Classifiers and regressors | |||
:class:`linear_model.BayesianRidge` for weighted linear regression. | |||
:issue:`10111` by :user:`Peter St. John <pstjohn>`. | |||
|
|||
- :class:'dummy.DummyClassifier' and :class:'dummy.DummyRegresssor' now | |||
only requires X to be an object with finite length or shape. | |||
:issue:'9832' by :user:'Vrishank Bhardwaj <vrishank97>'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use ` instead of ', please follow what others' are doing and check the result from Circle CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops.
I've removed the typos. I think I also accidentally included a few unrelated commits by others while rebasing. How should I remove them? |
At least you might recover the code according to the diff. |
Removed the diffs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ping @jnothman
Wow I have no idea about the root reason for this ... but it looks like merging this one reverted some changes in master, probably because of the weird history ...
I'll fix that in master but this is something to keep in mind (check the history has no quirks) before merging. |
Reverted the unwanted changes in master in e824fa1 |
How strange, thanks for cleaning up after me!
|
Thanks @lesteve Really sorry about that. Also curious about the reason. |
No worries! Anyone with a half-baked explanation I am interested ... |
Perhaps it's worth notifying support@github.com that there is a bug in squash and merge...? |
Maybe, before classifying it as a bug, it would be nice if someone can do a bit of research about
|
Fixes #9832
Adds dtype="None" and ensure_min_features=0 as params for check_array in DummyRegressor predict.