-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fitting default LogisticRegression on large sparse csr np.float32 matrix fails with n_jobs > 1 #15924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting a clear investigation. I think that you are right regarding the conversion mechanism, basically blocked by The only way to avoid this issue would be to be able to convert into 64 bits X before to memmap it, which would require to introspect the different check-array of the algorithm used. It seems already complicated :) ping @ogrisel who will have a much better picture of the internal and who can think of an easy way to solve the issue?
We are relying on the |
Sounds reasonable. However, I wonder if After a bit more investigation, I ran into #6614, which apparently points to a bug in |
OK, then I guess we can close as duplicate of #6614, thanks! |
I reopen since your script in the issue is correct and it should be good to discuss the underlying problem. |
I am closing this issue since it should be solved by installing the future SciPy release since scipy/scipy#18192 has been merged |
Description
When fitting a
LogisticRegression
on a sparsecsr
matrix withdtype
set tonp.float32
the call to
fit
fails, with the exception below, indicating we are trying to write to a read-only variable.I have read #4597, and the issue seems similar here:
joblib
thinks data matrix is large enough to be exposed as a read-only memory-mapped filelogistic.py
runs the following check, which can lead to an implicit cast tonp.float64
:Here,
check_X_y
tries to convertX
tonp.float64
which apparently cannot be done.Solutions appear to be either:
np.float64
typelbfgs
which I guess is acceptable. I am not sure if
check_X_y
could also do the type conversion without having to write X (this seems to be done as part of sorting)>The logistic regression API and documentation could also be improved to be more user-friendly:
The documentation for
LogisticRegression
does not mention thatlbfgs
only supportsnp.float64
. I think usingnp.float32
is a relatively common use case to limit memory usage, but here, even when memory mapping is not used, it silently leads to a copy with a different type being made.Should
check_X_y
check whetherX
is read-only before attempting to do a type conversion? This could fail with a clearer exception, if we are not able to do the copy without touching the original matrix.Code
Expected Results
More user-friendly exception/warning
Actual Results
Versions
ystem:
python: 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
executable: /opt/venv/python3/bin/python
machine: Linux-4.14.146-93.123.amzn1.x86_64-x86_64-with-glibc2.2.5
Python deps:
pip: 19.3.1
setuptools: 41.4.0
sklearn: 0.21.3
numpy: 1.13.1
scipy: 1.1.0
Cython: 0.26.1
pandas: 0.23.4
The text was updated successfully, but these errors were encountered: