Skip to content

[MRG+2] LogisticRegression convert to float64 (newton-cg) #8835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jun 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
520a2b3
Add a test to ensure not changing the input's data type
May 1, 2017
0d332bf
[WIP] Force X to remain float32. (self.coef_ remains float64 even if …
May 2, 2017
9ec920b
[WIP] ensure self.coef_ same type as X
May 5, 2017
d3cf91c
keep the np.float32 when multi_class='multinomial'
May 5, 2017
cb56481
Avoid hardcoded type for multinomial
May 5, 2017
14450a2
pass flake8
May 19, 2017
7adbb24
Ensure that the results in 32bits are the same as in 64
May 19, 2017
c0f6930
Address Gael's comments for multi_class=='ovr'
May 30, 2017
e6a3c23
Add multi_class=='multinominal' to test
May 30, 2017
4ac33e8
Add support for multi_class=='multinominal'
May 30, 2017
2f194b6
prefer float64 to float32
May 30, 2017
8ea7682
Force X and y to have the same type
May 30, 2017
75dc160
Revert "Add support for multi_class=='multinominal'"
May 30, 2017
03b6a57
remvert more stuff
May 30, 2017
18a5ac3
clean up some commmented code
May 30, 2017
05b750e
allow class_weight to take advantage of float32
May 30, 2017
4bef1da
Add a test where X.dtype is different of y.dtype
May 31, 2017
f63ef77
Address @raghavrv comments
Jun 2, 2017
1a29e0b
address the rest of @raghavrv's comments
Jun 2, 2017
366f751
Revert class_weight
Jun 2, 2017
093f25a
Avoid copying if dtype matches
Jun 2, 2017
fb545de
Address alex comment to the cast from inside _multinomial_loss_grad
Jun 6, 2017
15d5079
address alex comment
Jun 6, 2017
057e2e0
add sparsity test
Jun 6, 2017
0a07a2e
Merge pull request #3 from Henley13/is/8769
massich Jun 6, 2017
b80cd39
Addressed Tom comment of checking that we keep the 64 aswell
Jun 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions sklearn/linear_model/logistic.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,8 @@ def _multinomial_loss_grad(w, X, Y, alpha, sample_weight):
n_classes = Y.shape[1]
n_features = X.shape[1]
fit_intercept = (w.size == n_classes * (n_features + 1))
grad = np.zeros((n_classes, n_features + bool(fit_intercept)))
grad = np.zeros((n_classes, n_features + bool(fit_intercept)),
dtype=X.dtype)
loss, p, w = _multinomial_loss(w, X, Y, alpha, sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_multinomial_loss does not return float32 if dtype is float32 for X and y?
that would avoid the diff = diff.astype(X.dtype, copy=False) below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acutally, _multinomial_loss does keep all the types as float32. The problem is that Y in diff = sample_weight * (p - Y)(here) is of type int64 and therefore diff become float64.

Y is set (here) as the second parameter of args, based on target which is set at its time by y_bin or Y_multi. The former is fine, while the later is determined by the transform() of LabelBinarizer or LabelEncoder. (here, here)

We could transform target to X.dtype as in the saga case (here), in the following manner:

target = target.astype(X.dtype)
args = (X, target, 1. / C, sample_weight)

or we could propagate the change inside transform_fit.

any thoughts @agramfort ?

cc: @Henley13, @GaelVaroquaux, @raghavrv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would convert Y to dtype on the top

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could transform target to X.dtype as in the saga case (here), in the following manner:

target = target.astype(X.dtype)
args = (X, target, 1. / C, sample_weight)

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort
y is already in the correnct form.
are you proposing:

Y_multi = le.fit_transform(y).astype(X.dtype, copy=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

sample_weight = sample_weight[:, np.newaxis]
diff = sample_weight * (p - Y)
Expand Down Expand Up @@ -608,10 +609,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
# and check length
# Otherwise set them to 1 for all examples
if sample_weight is not None:
sample_weight = np.array(sample_weight, dtype=np.float64, order='C')
sample_weight = np.array(sample_weight, dtype=X.dtype, order='C')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be y.dtype?

cc: @agramfort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were discussing with @glemaitre (and @GaelVaroquaux) to force X.dtype and y.dtype to be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the idea should be that the dtype of X conditions the dtype of the computation.

We should be an RFC about this, and include it in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #8976

check_consistent_length(y, sample_weight)
else:
sample_weight = np.ones(X.shape[0])
sample_weight = np.ones(X.shape[0], dtype=X.dtype)

# If class_weights is a dict (provided by the user), the weights
# are assigned to the original labels. If it is "balanced", then
Expand All @@ -624,10 +625,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
# For doing a ovr, we need to mask the labels first. for the
# multinomial case this is not necessary.
if multi_class == 'ovr':
w0 = np.zeros(n_features + int(fit_intercept))
w0 = np.zeros(n_features + int(fit_intercept), dtype=X.dtype)
mask_classes = np.array([-1, 1])
mask = (y == pos_class)
y_bin = np.ones(y.shape, dtype=np.float64)
y_bin = np.ones(y.shape, dtype=X.dtype)
y_bin[~mask] = -1.
# for compute_class_weight

Expand All @@ -645,10 +646,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
else:
# SAG multinomial solver needs LabelEncoder, not LabelBinarizer
le = LabelEncoder()
Y_multi = le.fit_transform(y)
Y_multi = le.fit_transform(y).astype(X.dtype, copy=False)

w0 = np.zeros((classes.size, n_features + int(fit_intercept)),
order='F')
order='F', dtype=X.dtype)

if coef is not None:
# it must work both giving the bias term and not
Expand Down Expand Up @@ -1203,7 +1204,12 @@ def fit(self, X, y, sample_weight=None):
raise ValueError("Tolerance for stopping criteria must be "
"positive; got (tol=%r)" % self.tol)

X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
if self.solver in ['newton-cg']:
_dtype = [np.float64, np.float32]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I am missing something, but why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that previously check_X_y was converting X and y into np.float64. This is fine, if the user passes a list as X, but if a user passes a np.float32 willingly converting it to np.float64 penalizes them in memory and speed.

Therefore, we are trying to keep the data in np.float32 if the user provides the data in such type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that @raghavrv asks a question tells us that a short comment explaining the logic should probably be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that @raghavrv was more concerned in the fact that we were passing a list rather than forcing one or the other. Once we checked that check_X_y was taking care of it, he was ok with it.

@raghavrv any comments?

else:
_dtype = np.float64

X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype,
order="C")
check_classification_targets(y)
self.classes_ = np.unique(y)
Expand Down
30 changes: 30 additions & 0 deletions sklearn/linear_model/tests/test_logistic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1136,3 +1136,33 @@ def test_saga_vs_liblinear():
liblinear.fit(X, y)
# Convergence for alpha=1e-3 is very slow
assert_array_almost_equal(saga.coef_, liblinear.coef_, 3)


def test_dtype_match():
# Test that np.float32 input data is not cast to np.float64 when possible

X_32 = np.array(X).astype(np.float32)
y_32 = np.array(Y1).astype(np.float32)
X_64 = np.array(X).astype(np.float64)
y_64 = np.array(Y1).astype(np.float64)
X_sparse_32 = sp.csr_matrix(X, dtype=np.float32)

for solver in ['newton-cg']:
for multi_class in ['ovr', 'multinomial']:

# Check type consistency
lr_32 = LogisticRegression(solver=solver, multi_class=multi_class)
lr_32.fit(X_32, y_32)
assert_equal(lr_32.coef_.dtype, X_32.dtype)

# check consistency with sparsity
lr_32_sparse = LogisticRegression(solver=solver,
multi_class=multi_class)
lr_32_sparse.fit(X_sparse_32, y_32)
assert_equal(lr_32_sparse.coef_.dtype, X_sparse_32.dtype)

# Check accuracy consistency
lr_64 = LogisticRegression(solver=solver, multi_class=multi_class)
lr_64.fit(X_64, y_64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add:

assert_equal(lr_64.coef_.dtype, X_64.dtype)

otherwise this test passes when we transform everything to 32 bits

assert_equal(lr_64.coef_.dtype, X_64.dtype)
assert_almost_equal(lr_32.coef_, lr_64.coef_.astype(np.float32))