Skip to content

MNT More replacements of numpy aliases with built-in types #17707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 26, 2020

Conversation

alfaro96
Copy link
Member

@alfaro96 alfaro96 commented Jun 24, 2020

Reference Issues/PRs

Coming from #17687.

What does this implement/fix? Explain your changes.

This PR continues the deprecations of numpy aliases by built-in types.

Any other comments?

This PR was generated by:

$ find . -name "*.py" -print0 | xargs -0 sed -i "s/\bnp.object\b/object/g"

Two manual changes were applied in sklearn/ensemble/_gradient_boosting.pyx and sklearn/neural_network/tests/test_rbm.py.

CC @thomasjpfan @glemaitre @adrinjalali

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alfaro96 !

@glemaitre
Copy link
Member

Do we try to correct the one mentioned by Travis in this PR or it is preferable to merge as it is?

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 24, 2020

Do we try to correct the one mentioned by Travis in this PR or it is preferable to merge as it is?

I think it will be better to correct the errors shown by Travis before merging this PR to ensure that everything is properly working.

@alfaro96
Copy link
Member Author

I have applied the corresponding changes. Let us wait for the green in travis.

@alfaro96
Copy link
Member Author

Travis is failing because of these deprecations coming from pandas.

I think we should ignore these warnings until pandas decides to solve them (similarly to #17652).

@glemaitre
Copy link
Member

True but it seems that we have some PicklingError :S

@alfaro96
Copy link
Member Author

True but it seems that we have some PicklingError :S

This is what I am worried about 😕.

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 25, 2020

Pandas has already fixed these warnings (see pandas-dev/pandas#34835). I do not think that it is necessary to silent these warnings.

I think that this PR is ready to merge (we should handle the PicklingError in another PR).

WDYT @glemaitre?

@adrinjalali
Copy link
Member

Could you maybe explain where the pickling error comes from?

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 25, 2020

Could you maybe explain where the pickling error comes from?

The PicklingError comes from sklearn.cluster.AgglomerativeClustering while trying to cache the output of the computation of the tree.

In particular, the exception raises in lines:

out = memory.cache(tree_builder)(X, connectivity=connectivity,
n_clusters=n_clusters,
return_distance=return_distance,
**kwargs)

while sklearn/cluster/tests/test_hierarchical.py::test_agglomerative_clustering test is executed.

The exact error is:

 _pickle.PicklingError: ("Can't pickle <class 'numpy.dtype[float64]'>: it's not found as numpy.dtype[float64]", 'PicklingError while hashing {\'**\': {\'connectivity\': <100x100 sparse matrix of type \'<class \'numpy.int64\'>\'\n\twith 460 stored elements in COOrdinate format>, \'n_clusters\': None, \'return_distance\': False}, \'*\': [array([[ 1.76405235,  0.40015721,  0.97873798, ...,  0.77749036,\n        -1.61389785, -0.21274028],\n       [-0.89546656,  0.3869025 , -0.51080514, ...,  1.78587049,\n         0.12691209,  0.40198936],\n       [ 1.8831507 , -1.34775906, -1.270485  , ...,  1.11701629,\n        -1.31590741, -0.4615846 ],\n       ...,\n       [ 0.94853292,  1.23127609, -0.28818653, ...,  1.59242571,\n        -0.04144268,  1.37594158],\n       [ 1.35560413,  0.62532518,  1.67550176, ...,  0.52308533,\n         1.73676704,  0.50865349],\n       [ 1.16503885,  0.71312032,  1.3199769 , ...,  0.92918181,\n         0.22941801,  0.41440588]])]}: PicklingError("Can\'t pickle <class \'numpy.dtype[float64]\'>: it\'s not found as numpy.dtype[float64]")')

@adrinjalali
Copy link
Member

Does the

PicklingError("Can\'t pickle <class \'numpy.dtype[float64]\'>: it\'s not found as numpy.dtype[float64]")')

error make sense to you? Besides, It doesn't seem to be relevant to this PR since you're changing the object ones, right?

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 25, 2020

Does the

PicklingError("Can\'t pickle <class \'numpy.dtype[float64]\'>: it\'s not found as numpy.dtype[float64]")')

error make sense to you? Besides, It doesn't seem to be relevant to this PR since you're changing the object ones, right?

IMHO, this error does not make sense (why <class numpy.dtype[float64]> is not pickable as numpy.dtype[float64] 😕?).

I have applied a few of manual changes apart from np.object to object (specified in the description) but neither related with the sklearn.cluster module.

Nevertheless, I would like to investigate this issue and a few more related with the Linux_Nightly pylatest_pip_scipy_dev job to keep the CRON jobs properly working.

@adrinjalali
Copy link
Member

right now the best thing to do is to isolate the issue and figure out what exactly the issue is. And if it's not related to your changes (which I suspect is the case) then we can safely merge this PR. The issue right now is that we're in the dark.

Alternatively, if you confirm that the error exists on master, then we could also merge this and tackle it in a separate PR.

@alfaro96
Copy link
Member Author

right now the best thing to do is to isolate the issue and figure out what exactly the issue is. And if it's not related to your changes (which I suspect is the case) then we can safely merge this PR. The issue right now is that we're in the dark.

Alternatively, if you confirm that the error exists on master, then we could also merge this and tackle it in a separate PR.

You are absolutely right, the issue seems quite strange and we need to check if the error comes from this PR or also exists on master.

I will open a PR with the version of the master and check if this issue happens in travis and Linux_Nightly pylatest_pip_scipy_dev.

@adrinjalali
Copy link
Member

Ok this is becoming tricky (see #17719). I'm happy to merge this one and have the pickling issue investigated after.

@adrinjalali
Copy link
Member

WDYT @glemaitre

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 26, 2020

Bad news. PicklingError is issued on master for the Linux_Nightly pylatest_pip_scipy_dev job (lines 16924-17076 in https://dev.azure.com/scikit-learn/d9c6f05f-7f58-4a17-b143-2a4e7ff015af/_build/results?buildId=18821&view=logs&jobId=dfe99b15-50db-5d7b-b1e9-4105c42527cf).

This error is not related with this PR, but needs further investigation. I will try to have a look and determine why this is happening.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good then, let's merge this one. It'd be really nice if you could look into the pickling error.

@adrinjalali adrinjalali merged commit 547ead6 into scikit-learn:master Jun 26, 2020
@glemaitre
Copy link
Member

pickling make me think that something is happening with cloudpickle maybe. ping @ogrisel :)

@alfaro96 alfaro96 deleted the replace_deprecated_np_items branch June 26, 2020 12:26
@lesteve
Copy link
Member

lesteve commented Jun 26, 2020

pickling make me think that something is happening with cloudpickle maybe. ping @ogrisel :)

From what I can see, the error seems to be related to joblib.Memory argument hashing and its special dtype treatment:
https://github.com/joblib/joblib/blob/ac0f1528ffec9cd53cabbafaca887ac880697862/joblib/hashing.py#L222-L235

This comes from the fact that np.dtype('float64').__class__ is not picklable in numpy dev:

import pickle

import numpy as np


print('numpy version:', np.__version__)
dt = np.dtype('float64')
print('class: ', dt.__class__)
try:
    pickle.dumps(dt.__class__)
    print('Can pickle dt.__class__')
except Exception as exc:
    print('Exception while pickling dt.__class__')
    print(exc)

Output numpy-dev:

numpy version: 1.20.0.dev0+be8ab91
class:  <class 'numpy.dtype[float64]'>
Exception while pickling dt.__class__
Can't pickle <class 'numpy.dtype[float64]'>: attribute lookup dtype[float64] on numpy failed

Ouptut numpy 1.19.0:

numpy version: 1.19.0
class:  <class 'numpy.dtype'>
Can pickle dt.__class__

@alfaro96
Copy link
Member Author

pickling make me think that something is happening with cloudpickle maybe. ping @ogrisel :)

From what I can see, the error seems to be related to joblib.Memory argument hashing and its special dtype treatment:
https://github.com/joblib/joblib/blob/ac0f1528ffec9cd53cabbafaca887ac880697862/joblib/hashing.py#L222-L235

This comes from the fact that np.dtype('float64').__class__ is not picklable in numpy dev:

import pickle

import numpy as np


print('numpy version:', np.__version__)
dt = np.dtype('float64')
print('class: ', dt.__class__)
try:
    pickle.dumps(dt.__class__)
    print('Can pickle dt.__class__')
except Exception as exc:
    print('Exception while pickling dt.__class__')
    print(exc)

Output numpy-dev:

numpy version: 1.20.0.dev0+be8ab91
class:  <class 'numpy.dtype[float64]'>
Exception while pickling dt.__class__
Can't pickle <class 'numpy.dtype[float64]'>: attribute lookup dtype[float64] on numpy failed

Ouptut numpy 1.19.0:

numpy version: 1.19.0
class:  <class 'numpy.dtype'>
Can pickle dt.__class__

Thanks @lesteve for investigating this!

I will submit a PR and trigger the nightly jobs to check if these errors appear with the master version (using numpy==1.19.0).

If so, I would say that there is no much left we can do to solve this issue, just to wait until numpy makes np.float64 picklable.

@lesteve
Copy link
Member

lesteve commented Jun 26, 2020

FYI I have opened numpy/numpy#16692 to see whether that was an intended change from numpy.

@lesteve
Copy link
Member

lesteve commented Jun 26, 2020

If so, I would say that there is no much left we can do to solve this issue, just to wait until numpy makes np.float64 picklable.

Just for completeness the thing that is not picklable in numpy development version is numpy.dtype('float64').__class__ (and we were relying on it being picklable in joblib for working around some quirks of numpy.dtype pickling)

@alfaro96
Copy link
Member Author

If so, I would say that there is no much left we can do to solve this issue, just to wait until numpy makes np.float64 picklable.

Just for completeness the thing that is not picklable in numpy development version is numpy.dtype('float64').__class__ (and we were relying on it being picklable in joblib for working around some quirks of numpy.dtype pickling)

You are absolutely right. Thanks for the correction!

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Jul 17, 2020
…arn#17707)

* MNT More replacements of numpy aliases with built-in types [scipy-dev]

* MNT More (manual) replacements of numpy aliases

* MNT More (manual) replacements of numpy aliases

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX More deprecations of numpy aliases [scipy-dev]

* MNT Silent numpy aliases warnings [scipy-dev]

* FIX Fix silent deprecations

* Trigger build [scipy-dev]

* FIX Add scape characters [scipy-dev]

* FIX Remove package specification from silent

* Trigger build [scipy-dev]

* Revert
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…arn#17707)

* MNT More replacements of numpy aliases with built-in types [scipy-dev]

* MNT More (manual) replacements of numpy aliases

* MNT More (manual) replacements of numpy aliases

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX More deprecations of numpy aliases [scipy-dev]

* MNT Silent numpy aliases warnings [scipy-dev]

* FIX Fix silent deprecations

* Trigger build [scipy-dev]

* FIX Add scape characters [scipy-dev]

* FIX Remove package specification from silent

* Trigger build [scipy-dev]

* Revert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants