[MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) #8035

vincentpham1991 · 2016-12-11T08:18:59Z

Reference Issue

What does this implement/fix? Explain your changes.

This provides a test for different cases that throws an error when warm_start = True for MLPClassifier. Currently, vague errors are thrown when class size is different between the current fit and the previous fit. This fix will throw a clearer error message.

Any other comments?

jnothman · 2016-12-12T02:53:39Z

Thanks for working on this.

jnothman · 2016-12-14T10:40:10Z

sklearn/neural_network/multilayer_perceptron.py

+        """
+        if getattr(self, 'classes_', None) is not None:
+            num_classes = len(unique_labels(y))
+            if num_classes != len(self.classes_):


This is not a sufficient condition, really. What if they have different labels? I think what you want to do is change the condition here to be if not incremental or (self.warm_start and not hasattr(self, 'classes_')).

It would certainly be nice if we had some helpers to make writing this logic less error-prone :\

I'm not entirely certain that we want to exclude the case where a subset of the classes are fit... but it is admittedly a bit weird. Perhaps it's worth checking what other classifiers with warm_start do... though they may well just be broken.

I just did a check with RandomForestClassifier with warm_start. It also breaks when a different size class is fitted. In addition, it allows for a different subset of labels when class size is the same during fit.

class size meaning number of classes? Then the second is a bug

class size as the unique number of labels. For example unique(y) = [1,2,3] and unique(y_alt) = [4,5,6]. fit(X,y) then fit(X,y_alt) will not return an error when warm_start=True.

Another question, if second fit has a subset of labels from the first fit, should it be allowable? For example, unique(y) = [1,2,3] amd second fit is with unique(y_alt) = [1,2].

I think these are behaviours that the inventors of warm_start had not considered. I think, unlike for partial_fit, a subset should raise an error.

jnothman · 2016-12-14T10:40:23Z

sklearn/neural_network/tests/test_mlp.py

@@ -556,3 +561,40 @@ def test_adaptive_learning_rate():
    clf.fit(X, y)
    assert_greater(clf.max_iter, clf.n_iter_)
    assert_greater(1e-6, clf._optimizer.learning_rate)
+
+
+def test_warm_class():


*warm_start? *warm_start_classes?

jnothman · 2016-12-14T10:41:18Z

sklearn/neural_network/tests/test_mlp.py

+    y_5classes = np.array([0]*30 + [1]*30 + [2]*30 + [3]*30 + [4]*30)
+
+    with ignore_warnings(category=Warning):
+        # failed in converting 7th argument `g' of _lbfgsb.setulb to


This no longer happens, right?

I don't think we need to record the error message we're avoiding.

Yes, these are the errors that occurs originally and are replaced with a standard error message. I will remove these comments.

jnothman · 2016-12-14T10:41:24Z

sklearn/neural_network/tests/test_mlp.py

+        clf = MLPClassifier(hidden_layer_sizes=2, solver='lbfgs',
+                            warm_start=True)
+        clf.fit(X, y)
+        assert_raises(ValueError, clf.fit, X, y_2classes)


could we use assert_raises_message?

jnothman · 2016-12-14T10:42:10Z

sklearn/neural_network/tests/test_mlp.py

+        clf = MLPClassifier(hidden_layer_sizes=2, solver='lbfgs',
+                            warm_start=True)
+        clf.fit(X, y)
+        assert_raises(ValueError, clf.fit, X, y_4classes)


could we use assert_raises_message?

jnothman · 2016-12-14T10:42:53Z

sklearn/neural_network/multilayer_perceptron.py

+        """
+        if getattr(self, 'classes_', None) is not None:
+            num_classes = len(unique_labels(y))
+            if num_classes != len(self.classes_):


I'm not entirely certain that we want to exclude the case where a subset of the classes are fit... but it is admittedly a bit weird. Perhaps it's worth checking what other classifiers with warm_start do... though they may well just be broken.

jnothman · 2016-12-20T13:12:45Z

sklearn/neural_network/tests/test_mlp.py

+    y_4classes = np.array([0]*37 + [1]*37 + [2]*38 + [3]*38)
+    y_5classes = np.array([0]*30 + [1]*30 + [2]*30 + [3]*30 + [4]*30)
+
+    with ignore_warnings(category=Warning):


what warnings are we ignoring?

Sometimes it throws a RuntimeWarning: underflow encountered in exp np.exp(tmp, out=X). I tried ignore_warnings(category=RuntimeWarning) but it didn't seem to work.

I have not investigated what's going on here, but perhaps np.errstate will work?

jnothman · 2016-12-21T12:06:05Z

test failures

jnothman

thanks

jnothman · 2016-12-22T01:23:43Z

sklearn/neural_network/multilayer_perceptron.py

+        elif self.warm_start:
+            classes = unique_labels(y)
+            if set(classes) != set(self.classes_):
+                raise ValueError("`y` has classes not in `self.classes_`."


Let's make this clearer: "warm_start can only be used where the new y has the same classes as in the previous call to fit. Previously, got %r, now %r."

jnothman · 2016-12-22T01:25:51Z

sklearn/neural_network/tests/test_mlp.py

+                   " has [0 1 2]. 'y' has [0 1 2 3].")
+        assert_raise_message(ValueError, message, clf.fit, X, y_4classes)
+
+        # Test with 5 unique labels


This seems unnecessary. Instead can we test for y having labels [1, 2, 3] (no 0) or [0, 1, 3]: correct number of labels, incorrect set?

Initially, I was testing for the different types of errors that were popping up. But I can change this test to [1,2,3] instead of 5 classes.

jnothman · 2016-12-22T01:36:21Z

I suppose if that represented a different type of error at master, commenting that it is a non-regression test and just adding [1,2,3] is fine.

…

On 22 December 2016 at 12:31, Vincent Pham ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/neural_network/tests/test_mlp.py <#8035>: > + + # Test with 3 unique labels + clf = MLPClassifier(hidden_layer_sizes=2, solver='lbfgs', + warm_start=True) + clf.fit(X, y) + clf.fit(X, y_3classes) + + # Test with 4 unique labels + clf = MLPClassifier(hidden_layer_sizes=2, solver='lbfgs', + warm_start=True) + clf.fit(X, y) + message = ("`y` has classes not in `self.classes_`. `self.classes_`" + " has [0 1 2]. 'y' has [0 1 2 3].") + assert_raise_message(ValueError, message, clf.fit, X, y_4classes) + + # Test with 5 unique labels Initially, I was testing for the different types of errors that were popping up. But I can change this test to [1,2,3] instead of 5 classes. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8035>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67H0IT89EDBafZLB0wVAq1BJ29Wnks5rKdLsgaJpZM4LJ3oy> .

jnothman

Otherwise LGTM.

jnothman · 2016-12-23T04:34:58Z

sklearn/neural_network/tests/test_mlp.py

+        clf = MLPClassifier(hidden_layer_sizes=2, solver='lbfgs',
+                            warm_start=True)
+        clf.fit(X, y)
+        message = ('warm_start can only be used where `y` has the same classes as in the previous call to fit.'


Please observe PEP8's limit of 79 chars per line.

jnothman · 2016-12-23T04:35:45Z

sklearn/neural_network/multilayer_perceptron.py

+        elif self.warm_start:
+            classes = unique_labels(y)
+            if set(classes) != set(self.classes_):
+                raise ValueError("warm_start can only be used where `y` has the same classes as in the previous "


PEP8 line length

raghavrv

Besides LGTM. Thanks for working on this @vincentpham1991!

raghavrv · 2016-12-28T18:37:24Z

sklearn/neural_network/multilayer_perceptron.py

+            incremental = True
+        else:
+            incremental = False
+        return self._fit(X, y, incremental=incremental)


can we put this in one line

return self._fit(X, y, incremental=(self.warm_start and hasattr(self, "classes_"))

raghavrv · 2016-12-28T18:40:45Z

sklearn/neural_network/tests/test_mlp.py

@@ -23,7 +23,8 @@
 from sklearn.preprocessing import StandardScaler, MinMaxScaler
 from scipy.sparse import csr_matrix
 from sklearn.utils.testing import (assert_raises, assert_greater, assert_equal,
-                                   assert_false, ignore_warnings)
+                                   assert_false, ignore_warnings,
+                                   assert_raise_message)


could you put this in a new line with its own import statement? (we prefer that way as it makes it easier to fix conflicts)

(from sklearn.utils.testing import assert_raise_message)

raghavrv · 2016-12-28T18:43:30Z

sklearn/neural_network/tests/test_mlp.py

+
+
+def test_warm_start():
+    X = X_iris[:150]


Why? X_iris.shape[0] is 150...

raghavrv · 2016-12-28T18:46:19Z

sklearn/neural_network/tests/test_mlp.py

+    y_5classes = np.array([0] * 30 + [1] * 30 + [2] * 30 + [3] * 30 + [4] * 30)
+
+    with ignore_warnings(category=Warning):
+


Could you remove this extra blank line :)

raghavrv · 2016-12-28T18:47:15Z

sklearn/neural_network/tests/test_mlp.py

+    y = y_iris[:150]
+
+    y_2classes = np.array([0] * 75 + [1] * 75)
+    y_3classes = np.array([0] * 50 + [1] * 50 + [2] * 50)


This is same as y_iris... Could you change the order / shuffle it maybe?

np.array([0] * 40, [1] * 40, [2] * 70)

raghavrv · 2016-12-28T19:16:05Z

sklearn/neural_network/tests/test_mlp.py

@@ -556,3 +562,58 @@ def test_adaptive_learning_rate():
    clf.fit(X, y)
    assert_greater(clf.max_iter, clf.n_iter_)
    assert_greater(1e-6, clf._optimizer.learning_rate)
+
+
+def test_warm_start():


This test could be made much shorter by using a loop something like

for y_i in (y, y_3classes): # No error raised MLP().fit(X, y) MLP().fit(X, y_i) for y_i in (y_2classes, y_4classes): MLP().fit(X, y) msg = "..... Got %s" % list(set(y)) assert_raises_message(ValueError, msg, MLP().fit, X, y_i)

raghavrv · 2016-12-28T20:42:57Z

Sweet. That saves us 32 lines of code :) +1 for merge once the tests pass...

raghavrv · 2016-12-28T20:44:40Z

sklearn/neural_network/tests/test_mlp.py

+    y_4classes = np.array([0] * 37 + [1] * 37 + [2] * 38 + [3] * 38)
+    y_5classes = np.array([0] * 30 + [1] * 30 + [2] * 30 + [3] * 30 + [4] * 30)
+
+    with ignore_warnings(category=Warning):


Wait what warnings does this catch?

It was mentioned in a comment above: "Sometimes it throws a RuntimeWarning: underflow encountered in exp np.exp(tmp, out=X). I tried ignore_warnings(category=RuntimeWarning) but it didn't seem to work."

I also tried to use np.errstate but couldn't get it to work either.

Could you try @tguillemot's nifty fix to our @ignore_warnings decorator? (@ignore_warnings(RuntimeError))?

raghavrv · 2016-12-29T01:02:55Z

In it goes. Thanks @vincentpham1991

…rm start (scikit-learn#7976) (scikit-learn#8035) * added test that fails * generate standard value error for different class size * moved num_classes one class down * fixed over-indented lines * standard error occurs a layer up. * created a different label comparison for warm_start * spaces around multiplication sign. * reworded error and added another edge case. * fixed pep8 violation * make test shorter * updated ignore warning

added test that fails

661f2e4

vincentpham1991 added 3 commits December 12, 2016 00:18

generate standard value error for different class size

bf3ebe0

moved num_classes one class down

d7ccd24

fixed over-indented lines

7b23f07

vincentpham1991 changed the title ~~[WIP] Catch cases for different class size in MLP with warm start (#7976)~~ [MRG] Catch cases for different class size in MLP with warm start (#7976) Dec 12, 2016

vincentpham1991 changed the title ~~[MRG] Catch cases for different class size in MLP with warm start (#7976)~~ [MRG] Catch cases for different class size in MLPClassifier with warm start (#7976) Dec 12, 2016

jnothman added the Bug label Dec 14, 2016

jnothman self-requested a review December 14, 2016 00:04

jnothman reviewed Dec 14, 2016

View reviewed changes

standard error occurs a layer up.

357e17d

jnothman reviewed Dec 21, 2016

View reviewed changes

Pham, Vincent (CONT) added 2 commits December 21, 2016 14:34

created a different label comparison for warm_start

815625f

spaces around multiplication sign.

5f9bf89

jnothman requested changes Dec 22, 2016

View reviewed changes

reworded error and added another edge case.

1d1032a

jnothman approved these changes Dec 23, 2016

View reviewed changes

jnothman changed the title ~~[MRG] Catch cases for different class size in MLPClassifier with warm start (#7976)~~ [MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) Dec 23, 2016

fixed pep8 violation

fe6d4df

raghavrv suggested changes Dec 28, 2016

View reviewed changes

make test shorter

41ef2d3

raghavrv approved these changes Dec 28, 2016

View reviewed changes

raghavrv reviewed Dec 28, 2016

View reviewed changes

updated ignore warning

7dd8809

raghavrv merged commit ab1c4d4 into scikit-learn:master Dec 29, 2016

vincentpham1991 deleted the issue7976 branch January 20, 2017 02:31

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

		y_5classes = np.array([0] * 30 + [1] * 30 + [2] * 30 + [3] * 30 + [4] * 30)

		with ignore_warnings(category=Warning):

Uh oh!

[MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) #8035

[MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) #8035

Uh oh!

Conversation

vincentpham1991 commented Dec 11, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Dec 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentpham1991 Dec 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Dec 21, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Dec 22, 2016 via email

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vincentpham1991 Dec 16, 2016 •

edited

Loading

raghavrv Dec 28, 2016 •

edited

Loading