Skip to content

Audit our usage of numpy.astype to remove unecessary memory copies #4573

Closed
@ogrisel

Description

@ogrisel

The following pattern will always trigger a memory copy even when X as the right type:

X = X.astype(dtype)

In recent numpy, it's possible to avoid this problem with:

X = X.astype(dtype, copy=False)

However we need to keep the backward compatibility with numpy 1.6.1 that does not provide this option. Therefore we should indeed use:

from sklearn.utils.fixes import asype

X = astype(X, dtype, copy=False)

The following search reveals that we might have several places in our code were we do unecessary memory copy:

$ find sklearn -name "*.py" | xargs grep "astype" | grep -v "copy=" | grep -v "test_"

It would be great if someone could scan those and contribute one or several pull requests to use as astype(X, dtype, copy=False) each time we find a pattern that could potentially do large, unwanted memory copies. This issue can be tackled by new contributors to scikit-learn.

Slightly related to #4555.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions