-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Audit our usage of numpy.astype
to remove unecessary memory copies
#4573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This sounds like a good place to start contributing, I'll take a look! |
Ok I put in a PR which still failed on Travis for two things, so I added a commit to fix them, but it didn't take, so I tried a different PR and it just has a merge with recent master commits but not my changes O_o |
Please never merge the master branch to a feature branch. Start you feature branch from a recent version of master, then later you can rebase it from time to time on top of master. |
#4645 is merged, this issue can be closed |
I think there are still a couple left-over. How did you check?
well and a couple more... |
have been solved in #4645 :)
I am not sure about the memory/copy behavior of this line. Is it possible not to copy? Otherwise, I have checked with I am also not sure about the following:
where the data type changes from float64 to float32. Is it useful? |
There is no need to optimize the code in examples or tests functions. Just the code in the library itself. For X.data[mask] = valid_statistics[indexes].astype(X.dtype) could indeed be optimized as: X.data[mask] = astype(valid_statistics[indexes], X.dtype, copy=False) as |
whoops looks like I was on an outdated branch when checking. sorry. |
@TomDLT would you like to open a new PR to fix the remaining case in |
ok I will |
Thanks! |
The following pattern will always trigger a memory copy even when
X
as the right type:In recent numpy, it's possible to avoid this problem with:
However we need to keep the backward compatibility with numpy 1.6.1 that does not provide this option. Therefore we should indeed use:
The following search reveals that we might have several places in our code were we do unecessary memory copy:
It would be great if someone could scan those and contribute one or several pull requests to use as
astype(X, dtype, copy=False)
each time we find a pattern that could potentially do large, unwanted memory copies. This issue can be tackled by new contributors to scikit-learn.Slightly related to #4555.
The text was updated successfully, but these errors were encountered: