Closed
Description
The following pattern will always trigger a memory copy even when X
as the right type:
X = X.astype(dtype)
In recent numpy, it's possible to avoid this problem with:
X = X.astype(dtype, copy=False)
However we need to keep the backward compatibility with numpy 1.6.1 that does not provide this option. Therefore we should indeed use:
from sklearn.utils.fixes import asype
X = astype(X, dtype, copy=False)
The following search reveals that we might have several places in our code were we do unecessary memory copy:
$ find sklearn -name "*.py" | xargs grep "astype" | grep -v "copy=" | grep -v "test_"
It would be great if someone could scan those and contribute one or several pull requests to use as astype(X, dtype, copy=False)
each time we find a pattern that could potentially do large, unwanted memory copies. This issue can be tackled by new contributors to scikit-learn.
Slightly related to #4555.