Skip to content

[MRG] Fix sklearn.base.clone when estimator has any kind of sparse matrix as attribute #6910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jun 20, 2016

Reference Issue

Fix #6855.

@TomDLT
Copy link
Member

TomDLT commented Jun 20, 2016

The test fails with Python 2.6.
Otherwise LGTM

data = arr.data if sparse.issparse(arr) else arr
return data.flat[0], data.flat[-1]
except AttributeError:
# Sparse matrices without .data attribute. Only dok_matrix at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong test, but okay.

@jnothman
Copy link
Member

LGTM when test failure sorted.

@lesteve lesteve force-pushed the fix-clone-with-sparse-matrix-attribute branch 3 times, most recently from a98038a to a56522c Compare June 21, 2016 08:58

PY26 = sys.version_info[:2] == (2, 6)
if PY26:
# sp.dok_matrix can not be deepcopied in Python 2.6
Copy link
Member Author

@lesteve lesteve Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record:

from copy import deepcopy

import numpy as np

from scipy import sparse as sp

m = sp.dok_matrix(np.eye(5))

deepcopy(m)

fails on Python 2.6 with the error:

AttributeError: shape not found

For some reason it looks like the reconstructed matrix is missing some attributes. I think it's fine not testing dok_matrix for Python 2.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's because Py2.6 has a specialised reconstruction routine for dicts and fails to handle the subclassing correctly... I'm happy with this solution.

when estimator has any kind of sparse matrix as attribute. Add test.
@lesteve lesteve force-pushed the fix-clone-with-sparse-matrix-attribute branch from a56522c to 808d584 Compare June 21, 2016 09:30
@lesteve
Copy link
Member Author

lesteve commented Jun 21, 2016

@TomDLT @jnothman merge? You had both +1 conditioned on the tests passing and now they are.

@jnothman jnothman merged commit ccefc2e into scikit-learn:master Jun 21, 2016
@jnothman
Copy link
Member

Thanks @lesteve. I'll throw in a what's new.

@lesteve lesteve deleted the fix-clone-with-sparse-matrix-attribute branch June 21, 2016 13:30
imaculate pushed a commit to imaculate/scikit-learn that referenced this pull request Jun 23, 2016
agramfort pushed a commit that referenced this pull request Jun 23, 2016
… and documentation. Fixes #6862 (#6907)

* Make KernelCenterer a _pairwise operation

Replicate solution to 9a52077 except that `_pairwise` should always be `True` for `KernelCenterer` because it's supposed to receive a Gram matrix. This should make `KernelCenterer` usable in `Pipeline`s.

Happy to add tests, just tell me what should be covered.

* Adding test for PR #6900

* Simplifying imports and test

* updating changelog links on homepage (#6901)

* first commit

* changed binary average back to macro

* changed binomialNB to multinomialNB

* emphasis on "higher return values are better..." (#6909)

* fix typo in comment of hierarchical clustering (#6912)

* [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types (#6846)

* Fix sklearn.base.clone for all scipy.sparse formats (#6910)

* DOC If git is not installed, need to catch OSError

Fixes #6860

* DOC add what's new for clone fix

* fix a typo in ridge.py (#6917)

* pep8

* TST: Speed up: cv=2

This is a smoke test. Hence there is no point having cv=4

* Added support for sample_weight in linearSVR, including tests and documentation

* Changed assert to assert_allclose and assert_almost_equal, reduced the test tolerance

* Fixed pep8 violations and sampleweight format

* rebased with upstream
olologin pushed a commit to olologin/scikit-learn that referenced this pull request Aug 24, 2016
olologin pushed a commit to olologin/scikit-learn that referenced this pull request Aug 24, 2016
… and documentation. Fixes scikit-learn#6862 (scikit-learn#6907)

* Make KernelCenterer a _pairwise operation

Replicate solution to scikit-learn@9a52077 except that `_pairwise` should always be `True` for `KernelCenterer` because it's supposed to receive a Gram matrix. This should make `KernelCenterer` usable in `Pipeline`s.

Happy to add tests, just tell me what should be covered.

* Adding test for PR scikit-learn#6900

* Simplifying imports and test

* updating changelog links on homepage (scikit-learn#6901)

* first commit

* changed binary average back to macro

* changed binomialNB to multinomialNB

* emphasis on "higher return values are better..." (scikit-learn#6909)

* fix typo in comment of hierarchical clustering (scikit-learn#6912)

* [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types (scikit-learn#6846)

* Fix sklearn.base.clone for all scipy.sparse formats (scikit-learn#6910)

* DOC If git is not installed, need to catch OSError

Fixes scikit-learn#6860

* DOC add what's new for clone fix

* fix a typo in ridge.py (scikit-learn#6917)

* pep8

* TST: Speed up: cv=2

This is a smoke test. Hence there is no point having cv=4

* Added support for sample_weight in linearSVR, including tests and documentation

* Changed assert to assert_allclose and assert_almost_equal, reduced the test tolerance

* Fixed pep8 violations and sampleweight format

* rebased with upstream
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
… and documentation. Fixes scikit-learn#6862 (scikit-learn#6907)

* Make KernelCenterer a _pairwise operation

Replicate solution to scikit-learn@9a52077 except that `_pairwise` should always be `True` for `KernelCenterer` because it's supposed to receive a Gram matrix. This should make `KernelCenterer` usable in `Pipeline`s.

Happy to add tests, just tell me what should be covered.

* Adding test for PR scikit-learn#6900

* Simplifying imports and test

* updating changelog links on homepage (scikit-learn#6901)

* first commit

* changed binary average back to macro

* changed binomialNB to multinomialNB

* emphasis on "higher return values are better..." (scikit-learn#6909)

* fix typo in comment of hierarchical clustering (scikit-learn#6912)

* [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types (scikit-learn#6846)

* Fix sklearn.base.clone for all scipy.sparse formats (scikit-learn#6910)

* DOC If git is not installed, need to catch OSError

Fixes scikit-learn#6860

* DOC add what's new for clone fix

* fix a typo in ridge.py (scikit-learn#6917)

* pep8

* TST: Speed up: cv=2

This is a smoke test. Hence there is no point having cv=4

* Added support for sample_weight in linearSVR, including tests and documentation

* Changed assert to assert_allclose and assert_almost_equal, reduced the test tolerance

* Fixed pep8 violations and sampleweight format

* rebased with upstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants