[MRG+2] LogisticRegression convert to float64 (newton-cg) #8835

massich · 2017-05-05T17:29:53Z

Reference Issue

Fixes #8769

What does this implement/fix? Explain your changes.

Avoids logistic regression to aggressively cast the data to np.float64 when np.float32 is supplied.

Any other comments?

(only for the newton-cg case)

massich · 2017-05-05T17:30:52Z

@GaelVaroquaux Actually fixing self.coefs_ was straight forward. Where do you wanna go from here?

massich · 2017-05-05T17:34:20Z

sklearn/linear_model/logistic.py

@@ -1281,9 +1287,9 @@ def fit(self, X, y, sample_weight=None):
        self.n_iter_ = np.asarray(n_iter_, dtype=np.int32)[:, 0]

        if self.multi_class == 'multinomial':
-            self.coef_ = fold_coefs_[0][0]
+            self.coef_ = fold_coefs_[0][0].astype(np.float32)


my bad it should be _dtype

glemaitre · 2017-05-05T21:35:45Z

You can execute the PEP8 check locally:

bash ./build_tools/travis/flake8_diff.sh

That should be useful in the future.

Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg

…X is not)

GaelVaroquaux · 2017-05-29T09:59:26Z

sklearn/linear_model/logistic.py

+        _dtype = np.float64
+        if self.solver in ['newton-cg'] \
+                and isinstance(X, np.ndarray) and X.dtype in [np.float32]:
+            _dtype = np.float32


check_X_y can take a list of acceptable dtypes as a dtype argument. I think that using this feature would be a better way of writing this code. The code would be something like

if self.solver in ['newtown-cg']: _dtype = [np.float64, np.float32]

GaelVaroquaux · 2017-05-29T10:00:48Z

sklearn/linear_model/logistic.py

        else:
-            self.coef_ = np.asarray(fold_coefs_)
+            self.coef_ = np.asarray(fold_coefs_, dtype=_dtype)


Is the conversion necessary here? In other word, if we get the code right, doesn't coefs_ get returned in the right dtype?

GaelVaroquaux · 2017-05-29T10:05:41Z

I suspect that the problem isn't really solved: if you look a bit further in the code, you will see that inside 'logistic_regression_path', check_X_y is called again with the np.float64 dtype. And there might be other instances of this problem.

massich

Indeed, logistic_regression_path has a check_array with a np.float64 as a dtype. However, when logistic_regression_path is called with check_input=False, therefore X.dtype remains np.float32. (see here)

Still, w0 starts as an empty list and end up being a np.float64.(see here)

GaelVaroquaux · 2017-05-30T09:39:51Z

However, when logistic_regression_path is called, check_input=False, therefore X.dtype remains np.float32.

Good catch!

Still, coefs starts as an empty list and end up being a np.float64. I'll try to figure this out today.

Right, that might be where the problem needs to be fixed.

This reverts commit 4ac33e8.

raghavrv

Thanks for the PR!

raghavrv · 2017-06-02T11:42:20Z

sklearn/linear_model/logistic.py

@@ -1203,7 +1205,12 @@ def fit(self, X, y, sample_weight=None):
            raise ValueError("Tolerance for stopping criteria must be "
                             "positive; got (tol=%r)" % self.tol)

-        X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
+        if self.solver in ['newton-cg']:
+            _dtype = [np.float64, np.float32]


Sorry if I am missing something, but why?

The idea is that previously check_X_y was converting X and y into np.float64. This is fine, if the user passes a list as X, but if a user passes a np.float32 willingly converting it to np.float64 penalizes them in memory and speed.

Therefore, we are trying to keep the data in np.float32 if the user provides the data in such type.

The fact that @raghavrv asks a question tells us that a short comment explaining the logic should probably be useful here.

I think that @raghavrv was more concerned in the fact that we were passing a list rather than forcing one or the other. Once we checked that check_X_y was taking care of it, he was ok with it.

@raghavrv any comments?

raghavrv · 2017-06-02T11:43:12Z

sklearn/linear_model/tests/test_logistic.py

+
+    for solver in ['newton-cg']:
+        for multi_class in ['ovr', 'multinomial']:
+


can you remove this new line

raghavrv · 2017-06-02T11:43:25Z

sklearn/linear_model/tests/test_logistic.py

+
+def test_dtype_missmatch_to_profile():
+    # Test that np.float32 input data is not cast to np.float64 when possible
+


and this newline too

raghavrv · 2017-06-02T11:45:08Z

sklearn/utils/class_weight.py

@@ -41,12 +41,17 @@ def compute_class_weight(class_weight, classes, y):
    # Import error caused by circular imports.
    from ..preprocessing import LabelEncoder

+    if y.dtype == np.float32:
+        _dtype = np.float32


why not _dtype=y.dtype...

is it so you can have y.dtype to be int and weight to be of float?

raghavrv · 2017-06-02T11:48:07Z

sklearn/linear_model/tests/test_logistic.py

+
+            # Check accuracy consistency
+            lr_64 = LogisticRegression(solver=solver, multi_class=multi_class)
+            lr_64.fit(X, Y1)


Can you ensure (maybe using astype?) X, Y1 are of float64 before this test? (If in future it is changed, this test will still pass)

raghavrv · 2017-06-02T11:51:40Z

sklearn/linear_model/tests/test_logistic.py

+def test_dtype_match():
+    # Test that np.float32 input data is not cast to np.float64 when possible
+
+    X_ = np.array(X).astype(np.float32)


X_32 = ... astype(32)
X_64 = ... astype(64)

raghavrv · 2017-06-02T11:53:06Z

sklearn/linear_model/tests/test_logistic.py

+            assert_almost_equal(lr_32.coef_, lr_64.coef_.astype(np.float32))
+
+
+def test_dtype_missmatch_to_profile():


This test can be removed...

raghavrv · 2017-06-02T11:53:48Z

sklearn/linear_model/logistic.py

@@ -608,10 +610,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
    # and check length
    # Otherwise set them to 1 for all examples
    if sample_weight is not None:
-        sample_weight = np.array(sample_weight, dtype=np.float64, order='C')
+        sample_weight = np.array(sample_weight, dtype=X.dtype, order='C')


Should it be y.dtype?

cc: @agramfort

we were discussing with @glemaitre (and @GaelVaroquaux) to force X.dtype and y.dtype to be the same.

Yes, I think the idea should be that the dtype of X conditions the dtype of the computation.

We should be an RFC about this, and include it in the docs.

TomDLT · 2017-06-06T13:55:21Z

sklearn/linear_model/tests/test_logistic.py

+
+            # Check accuracy consistency
+            lr_64 = LogisticRegression(solver=solver, multi_class=multi_class)
+            lr_64.fit(X_64, y_64)


please add:

assert_equal(lr_64.coef_.dtype, X_64.dtype)

otherwise this test passes when we transform everything to 32 bits

TomDLT

LGTM

agramfort · 2017-06-06T15:31:04Z

+1 for MRG if travis is happy

GaelVaroquaux · 2017-06-06T18:17:29Z

I would convert Y to dtype on the top

+1

GaelVaroquaux · 2017-06-06T18:20:10Z

Anybody has ideas what's wrong with AppVeyor. Cc @ogrisel @lesteve

GaelVaroquaux · 2017-06-06T18:21:26Z

Before we merge, this warrants a whats_new entry.

agramfort · 2017-06-06T19:57:53Z

appveyor is not happy :(

jnothman · 2017-06-07T00:38:14Z

It looks like some of the appveyor and travis unhappiness may have been caused by a github outage.

…

On 7 June 2017 at 05:57, Alexandre Gramfort ***@***.***> wrote: appveyor is not happy :( — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8835 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66rWVqgrn_OUR50p17qnX4jHHQDUks5sBa9CgaJpZM4NSM_O> .

jnothman · 2017-06-07T00:39:07Z

https://status.github.com/ and otherwise due to backlog?

…

On 7 June 2017 at 10:38, Joel Nothman ***@***.***> wrote: It looks like some of the appveyor and travis unhappiness may have been caused by a github outage. On 7 June 2017 at 05:57, Alexandre Gramfort ***@***.***> wrote: > appveyor is not happy :( > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#8835 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz66rWVqgrn_OUR50p17qnX4jHHQDUks5sBa9CgaJpZM4NSM_O> > . >

agramfort · 2017-06-07T06:10:35Z

all green merging

GaelVaroquaux · 2017-06-07T06:29:10Z

Whats_new entry would have been good :)

agramfort · 2017-06-07T07:31:52Z

here you go: 8baaa0e

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

@raghavrv

…rn#8835) * Add a test to ensure not changing the input's data type Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg * [WIP] Force X to remain float32. (self.coef_ remains float64 even if X is not) * [WIP] ensure self.coef_ same type as X * keep the np.float32 when multi_class='multinomial' * Avoid hardcoded type for multinomial * pass flake8 * Ensure that the results in 32bits are the same as in 64 * Address Gael's comments for multi_class=='ovr' * Add multi_class=='multinominal' to test * Add support for multi_class=='multinominal' * prefer float64 to float32 * Force X and y to have the same type * Revert "Add support for multi_class=='multinominal'" This reverts commit 4ac33e8. * remvert more stuff * clean up some commmented code * allow class_weight to take advantage of float32 * Add a test where X.dtype is different of y.dtype * Address @raghavrv comments * address the rest of @raghavrv's comments * Revert class_weight * Avoid copying if dtype matches * Address alex comment to the cast from inside _multinomial_loss_grad * address alex comment * add sparsity test * Addressed Tom comment of checking that we keep the 64 aswell

massich changed the title ~~Is/8769~~ [WIP] LogisticRegression convert to float64 May 5, 2017

massich commented May 5, 2017

View reviewed changes

Joan Massich added 6 commits May 19, 2017 18:07

Add a test to ensure not changing the input's data type

520a2b3

Test that np.float32 input data is not cast to np.float64 when using LR + newton-cg

[WIP] Force X to remain float32. (self.coef_ remains float64 even if …

0d332bf

…X is not)

[WIP] ensure self.coef_ same type as X

9ec920b

keep the np.float32 when multi_class='multinomial'

d3cf91c

Avoid hardcoded type for multinomial

cb56481

pass flake8

14450a2

massich force-pushed the is/8769 branch from 310084b to 14450a2 Compare May 19, 2017 16:17

massich changed the title ~~[WIP] LogisticRegression convert to float64~~ [MRG] LogisticRegression convert to float64 May 19, 2017

Ensure that the results in 32bits are the same as in 64

7adbb24

GaelVaroquaux reviewed May 29, 2017

View reviewed changes

massich changed the title ~~[MRG] LogisticRegression convert to float64~~ [WIP] LogisticRegression convert to float64 May 30, 2017

massich commented May 30, 2017

View reviewed changes

Joan Massich added 10 commits May 30, 2017 16:30

Address Gael's comments for multi_class=='ovr'

c0f6930

Add multi_class=='multinominal' to test

e6a3c23

Add support for multi_class=='multinominal'

4ac33e8

prefer float64 to float32

2f194b6

Force X and y to have the same type

8ea7682

Revert "Add support for multi_class=='multinominal'"

75dc160

This reverts commit 4ac33e8.

remvert more stuff

03b6a57

clean up some commmented code

18a5ac3

allow class_weight to take advantage of float32

05b750e

Add a test where X.dtype is different of y.dtype

4bef1da

raghavrv suggested changes Jun 2, 2017

View reviewed changes

massich changed the title ~~[WIP] LogisticRegression convert to float64~~ [MRG] LogisticRegression convert to float64 Jun 6, 2017

TomDLT reviewed Jun 6, 2017

View reviewed changes

Addressed Tom comment of checking that we keep the 64 aswell

b80cd39

agramfort changed the title ~~[MRG] LogisticRegression convert to float64~~ [MRG+1] LogisticRegression convert to float64 Jun 6, 2017

TomDLT changed the title ~~[MRG+1] LogisticRegression convert to float64~~ [MRG+2] LogisticRegression convert to float64 Jun 6, 2017

massich changed the title ~~[MRG+2] LogisticRegression convert to float64~~ [MRG+2] LogisticRegression convert to float64 (newton-cg) Jun 6, 2017

Henley13 mentioned this pull request Jun 6, 2017

[MRG] LogisticRegression convert to float64 (sag) #9020

Closed

raghavrv approved these changes Jun 6, 2017

View reviewed changes

agramfort merged commit 39a4658 into scikit-learn:master Jun 7, 2017

massich mentioned this pull request Jun 7, 2017

LogisticRegression convert to float64 #8769

Closed

NelleV mentioned this pull request May 28, 2018

[MRG] LogisticRegression convert to float64 (sag) #11155

Closed


		for solver in ['newton-cg']:
		for multi_class in ['ovr', 'multinomial']:


		def test_dtype_missmatch_to_profile():
		# Test that np.float32 input data is not cast to np.float64 when possible

		assert_almost_equal(lr_32.coef_, lr_64.coef_.astype(np.float32))


		def test_dtype_missmatch_to_profile():

Uh oh!

[MRG+2] LogisticRegression convert to float64 (newton-cg) #8835

[MRG+2] LogisticRegression convert to float64 (newton-cg) #8835

Uh oh!

Conversation

massich commented May 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

massich commented May 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented May 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented May 29, 2017

Uh oh!

massich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented May 30, 2017 via email

Uh oh!

raghavrv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomDLT left a comment

Choose a reason for hiding this comment

Uh oh!

agramfort commented Jun 6, 2017

Uh oh!

GaelVaroquaux commented Jun 6, 2017 via email

Uh oh!

GaelVaroquaux commented Jun 6, 2017

Uh oh!

GaelVaroquaux commented Jun 6, 2017

Uh oh!

agramfort commented Jun 6, 2017

Uh oh!

jnothman commented Jun 7, 2017 via email

Uh oh!

jnothman commented Jun 7, 2017 via email

Uh oh!

massich commented May 5, 2017 •

edited

Loading

massich commented May 5, 2017 •

edited

Loading

massich left a comment •

edited

Loading