-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] LogisticRegression convert to float64 (newton-cg) #8835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
520a2b3
0d332bf
9ec920b
d3cf91c
cb56481
14450a2
7adbb24
c0f6930
e6a3c23
4ac33e8
2f194b6
8ea7682
75dc160
03b6a57
18a5ac3
05b750e
4bef1da
f63ef77
1a29e0b
366f751
093f25a
fb545de
15d5079
057e2e0
0a07a2e
b80cd39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -337,7 +337,8 @@ def _multinomial_loss_grad(w, X, Y, alpha, sample_weight): | |
n_classes = Y.shape[1] | ||
n_features = X.shape[1] | ||
fit_intercept = (w.size == n_classes * (n_features + 1)) | ||
grad = np.zeros((n_classes, n_features + bool(fit_intercept))) | ||
grad = np.zeros((n_classes, n_features + bool(fit_intercept)), | ||
dtype=X.dtype) | ||
loss, p, w = _multinomial_loss(w, X, Y, alpha, sample_weight) | ||
sample_weight = sample_weight[:, np.newaxis] | ||
diff = sample_weight * (p - Y) | ||
|
@@ -608,10 +609,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, | |
# and check length | ||
# Otherwise set them to 1 for all examples | ||
if sample_weight is not None: | ||
sample_weight = np.array(sample_weight, dtype=np.float64, order='C') | ||
sample_weight = np.array(sample_weight, dtype=X.dtype, order='C') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it be cc: @agramfort There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we were discussing with @glemaitre (and @GaelVaroquaux) to force There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think the idea should be that the dtype of X conditions the dtype of the computation. We should be an RFC about this, and include it in the docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see #8976 |
||
check_consistent_length(y, sample_weight) | ||
else: | ||
sample_weight = np.ones(X.shape[0]) | ||
sample_weight = np.ones(X.shape[0], dtype=X.dtype) | ||
|
||
# If class_weights is a dict (provided by the user), the weights | ||
# are assigned to the original labels. If it is "balanced", then | ||
|
@@ -624,10 +625,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, | |
# For doing a ovr, we need to mask the labels first. for the | ||
# multinomial case this is not necessary. | ||
if multi_class == 'ovr': | ||
w0 = np.zeros(n_features + int(fit_intercept)) | ||
w0 = np.zeros(n_features + int(fit_intercept), dtype=X.dtype) | ||
mask_classes = np.array([-1, 1]) | ||
mask = (y == pos_class) | ||
y_bin = np.ones(y.shape, dtype=np.float64) | ||
y_bin = np.ones(y.shape, dtype=X.dtype) | ||
y_bin[~mask] = -1. | ||
# for compute_class_weight | ||
|
||
|
@@ -645,10 +646,10 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, | |
else: | ||
# SAG multinomial solver needs LabelEncoder, not LabelBinarizer | ||
le = LabelEncoder() | ||
Y_multi = le.fit_transform(y) | ||
Y_multi = le.fit_transform(y).astype(X.dtype, copy=False) | ||
|
||
w0 = np.zeros((classes.size, n_features + int(fit_intercept)), | ||
order='F') | ||
order='F', dtype=X.dtype) | ||
|
||
if coef is not None: | ||
# it must work both giving the bias term and not | ||
|
@@ -1203,7 +1204,12 @@ def fit(self, X, y, sample_weight=None): | |
raise ValueError("Tolerance for stopping criteria must be " | ||
"positive; got (tol=%r)" % self.tol) | ||
|
||
X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64, | ||
if self.solver in ['newton-cg']: | ||
_dtype = [np.float64, np.float32] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if I am missing something, but why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea is that previously Therefore, we are trying to keep the data in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact that @raghavrv asks a question tells us that a short comment explaining the logic should probably be useful here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
else: | ||
_dtype = np.float64 | ||
|
||
X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, | ||
order="C") | ||
check_classification_targets(y) | ||
self.classes_ = np.unique(y) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1136,3 +1136,33 @@ def test_saga_vs_liblinear(): | |
liblinear.fit(X, y) | ||
# Convergence for alpha=1e-3 is very slow | ||
assert_array_almost_equal(saga.coef_, liblinear.coef_, 3) | ||
|
||
|
||
def test_dtype_match(): | ||
# Test that np.float32 input data is not cast to np.float64 when possible | ||
|
||
X_32 = np.array(X).astype(np.float32) | ||
y_32 = np.array(Y1).astype(np.float32) | ||
X_64 = np.array(X).astype(np.float64) | ||
y_64 = np.array(Y1).astype(np.float64) | ||
X_sparse_32 = sp.csr_matrix(X, dtype=np.float32) | ||
|
||
for solver in ['newton-cg']: | ||
for multi_class in ['ovr', 'multinomial']: | ||
|
||
# Check type consistency | ||
lr_32 = LogisticRegression(solver=solver, multi_class=multi_class) | ||
lr_32.fit(X_32, y_32) | ||
assert_equal(lr_32.coef_.dtype, X_32.dtype) | ||
|
||
# check consistency with sparsity | ||
lr_32_sparse = LogisticRegression(solver=solver, | ||
multi_class=multi_class) | ||
lr_32_sparse.fit(X_sparse_32, y_32) | ||
assert_equal(lr_32_sparse.coef_.dtype, X_sparse_32.dtype) | ||
|
||
# Check accuracy consistency | ||
lr_64 = LogisticRegression(solver=solver, multi_class=multi_class) | ||
lr_64.fit(X_64, y_64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please add:
otherwise this test passes when we transform everything to 32 bits |
||
assert_equal(lr_64.coef_.dtype, X_64.dtype) | ||
assert_almost_equal(lr_32.coef_, lr_64.coef_.astype(np.float32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_multinomial_loss does not return float32 if dtype is float32 for X and y?
that would avoid the diff = diff.astype(X.dtype, copy=False) below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acutally, _multinomial_loss does keep all the types as float32. The problem is that
Y
indiff = sample_weight * (p - Y)
(here) is of typeint64
and thereforediff
becomefloat64
.Y
is set (here) as the second parameter ofargs
, based ontarget
which is set at its time byy_bin
orY_multi
. The former is fine, while the later is determined by thetransform()
ofLabelBinarizer
orLabelEncoder
. (here, here)We could transform
target
toX.dtype
as in thesaga
case (here), in the following manner:or we could propagate the change inside
transform_fit
.any thoughts @agramfort ?
cc: @Henley13, @GaelVaroquaux, @raghavrv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would convert Y to dtype on the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort
y
is already in the correnct form.are you proposing:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.