-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Astype fix #4645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Astype fix #4645
Conversation
thanks @TomDLT Travis is not happy. You have failing tests. |
OK maybe I will start testing before pushing |
Also it would be great to rebase this branch on top of current master while removing the |
Sounds like a plan ;) |
meh, you can also just check travis after pushing ;) but that utually takes longer. |
0222c8b
to
411b868
Compare
Big thanks to @ogrisel that worked on this with me TODO: I will look at this monday |
To fix the test that fail on travis, you might want to setup a conda env with the same versions for numpy and scipy.
|
Good catch, travis is now green. I re-wrapped and moved your todo list to put it in the description of the PR. Once all the boxes are ticked please edit the title of the PR to replace |
In note : |
I'm not sure how great an idea of |
This is what we are doing in this PR when you pass |
+1 for using |
X_train = X_train.astype(np.float64) | ||
X_test = X_test.astype(np.float64) | ||
X_train = astype(X_train, np.float64, copy=False) | ||
X_test = astype(X_test, np.float64, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to set copy=False
for this use case as the original data type is int16
for sure as explained in the previous commit. There is no way to avoid the malloc / dtype conversion in that case. We can keep:
X_train = X_train.astype(np.float64)
X_test = X_test.astype(np.float64)
Is there a clean way to revert some old commits without a total rebase? |
Run a |
# not a data type (e.g. a column named dtype in a pandas DataFrame) | ||
dtype_orig = None | ||
|
||
if dtype_numeric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure if we want to keep this behavior. In what cases do we want to pass through integer data? Or bytes? But maybe that is for another PR? I'm not sure....
We could also just set dtype = NUMERIC_DTYPES (which would be a long list) to remove this branch at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about that option but I think it's fine to leave it this way for now.
This is awesome, thanks! I'm not sure how good the "numeric" option was, and maybe we should remove it now, and use explicit dtypes instead? But I think we can merge this before that refactoring. |
Do you want me to squash all commits in only one? |
Maybe we should deprecate |
+1 for thinking about this point again once this PR has been merged. |
I think so yes. |
I think removing |
+1 on squashing everything and +1 on merge. |
See #4685 for tracking the possible removal of "numeric". |
Squashed |
Just curiosity, to deprecate |
from . import deprecated maybe? |
It did not work for me |
@ogrisel merge? |
Fine with the removal of Can you please rebase on top of the current master once again? There is a small conflict in |
ENH improve check_array to warn on dtype conversions ENH make check_array accept several dtypes ENH change validation with improved check_array ENH change astype to avoid copy if possible ENH remove warn_if_not_float
Rebased on top of master |
Thanks :) |
This is really great btw 🍻 |
Thanks :) |
🍻 !
|
I forgot to drink a beer on that PR: 🍻 |
I extended
check_array
to add support foraccept_sparse=True
and solve the 3 errors by changing
astype
intocheck_array
when sparse matrix are possibleI will check the other uses of
astype
tomorrow.TODO: