Fitting default LogisticRegression on large sparse csr np.float32 matrix fails with n_jobs > 1

## Description

When fitting a `LogisticRegression` on a sparse `csr` matrix with `dtype` set to `np.float32`

```
<378019x13491 sparse matrix of type '<class 'numpy.float32'>'
    with 13405727 stored elements in Compressed Sparse Row format>
```

the call to `fit` fails, with the exception below, indicating we are trying to write to a read-only variable.

I have read https://github.com/scikit-learn/scikit-learn/issues/4597, and the issue seems similar here:

- `joblib` thinks data matrix is large enough to be exposed as a read-only memory-mapped file
- `logistic.py` runs the following check, which can lead to an implicit cast to `np.float64`:

```py
        if solver == 'lbfgs':
            _dtype = np.float64
        else:
            _dtype = [np.float64, np.float32]

        X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
                         accept_large_sparse=solver != 'liblinear')
```

Here, `check_X_y` tries to convert `X` to `np.float64` which apparently cannot be done.

Solutions appear to be either:

- Pass in a matrix with `np.float64` type
- Use a different solver than `lbfgs`

which I guess is acceptable. I am not sure if `check_X_y` could also do the type conversion without having to write X (this seems to be done as part of sorting)>

The logistic regression API and documentation could also be improved to be more user-friendly:

- The documentation for `LogisticRegression` does not mention that `lbfgs` only supports `np.float64`. I think using `np.float32` is a relatively common use case to limit memory usage, but here, even when memory mapping is not used, it silently leads to a copy with a different type being made.

- Should `check_X_y` check whether `X` is read-only before attempting to do a type conversion? This could fail with a clearer exception, if we are not able to do the copy without touching the original matrix.

#### Code

```py
from sklearn.model_selection import GridSearchCV, GroupKFold, StratifiedKFold
from sklearn.metrics import accuracy_score, balanced_accuracy_score, make_scorer
from sklearn.linear_model import LogisticRegression

parameters = {"C": np.logspace(-4, 1, 6), "penalty": ["l2"], "max_iter": [800]}

clf = LogisticRegression(verbose=3, class_weight="balanced")
cv = StratifiedKFold(3)
grid_cv = GridSearchCV(
    clf,
    parameters,
    refit=False,
    cv=cv,
    scoring=make_scorer(balanced_accuracy_score),
    n_jobs=20,
)

clf_ovr = OneVsRestClassifier(grid_cv, n_jobs=20)
clf_ovr.fit(X_np32, y)
```

#### Expected Results

More user-friendly exception/warning

#### Actual Results

```
Traceback (most recent call last):
  File "/opt/venv/python3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/opt/venv/python3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/opt/venv/python3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 600, in __call__
    return self.func(*args, **kwargs)
  File "/opt/venv/python3/lib/python3.6/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/opt/venv/python3/lib/python3.6/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/multiclass.py", line 80, in _fit_binary
    estimator.fit(X, y)
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 715, in fit
    self.best_estimator_.fit(X, y, **fit_params)
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1532, in fit
    accept_large_sparse=solver != 'liblinear')
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 719, in check_X_y
    estimator=estimator)
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 486, in check_array
    accept_large_sparse=accept_large_sparse)
  File "/opt/venv/python3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 309, in _ensure_sparse_format
    spmatrix = spmatrix.astype(dtype)
  File "/opt/venv/python3/lib/python3.6/site-packages/scipy/sparse/data.py", line 71, in astype
    self._deduped_data().astype(dtype, casting=casting, copy=copy),
  File "/opt/venv/python3/lib/python3.6/site-packages/scipy/sparse/data.py", line 34, in _deduped_data
    self.sum_duplicates()
  File "/opt/venv/python3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1013, in sum_duplicates
    self.sort_indices()
  File "/opt/venv/python3/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1059, in sort_indices
    self.indices, self.data)
ValueError: UPDATEIFCOPY base is read-only
```

#### Versions
ystem:
    python: 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
executable: /opt/venv/python3/bin/python
   machine: Linux-4.14.146-93.123.amzn1.x86_64-x86_64-with-glibc2.2.5

Python deps:
       pip: 19.3.1
setuptools: 41.4.0
   sklearn: 0.21.3
     numpy: 1.13.1
     scipy: 1.1.0
    Cython: 0.26.1
    pandas: 0.23.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fitting default LogisticRegression on large sparse csr np.float32 matrix fails with n_jobs > 1 #15924

Description

Code

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Fitting default LogisticRegression on large sparse csr np.float32 matrix fails with n_jobs > 1 #15924

Description

Description

Code

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions