-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] FIX sklearn.cluster.spectral_clustering fails with np.matrix
input (#10993)
#11070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not changing estimator_checks? |
At first I changed estimator_checks, then I noticed that was not necessary. Because after some testing, I noticed that all the clustering algorithms (e.g. KMeans, AgglomerativeClustering, MeanShift, DBSCAN etc.) work with np.matrix smoothly. Thus, I think the mentioned problem in this issue was limited to Spectral Clustering. |
interesting. I would still be keen on a common test to ensure matrix works
across the board, for instance. Your test whether clusterers work was
unlikely to check that attributes stored are arrays and not matrices, so I
also have my doubts about how certain you are that they completely worked...
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this LGTM.
Though we should really have a generic test for consistency between class and function versions of clusterers that makes this check, instead of ad-hoc tests.
@@ -60,6 +60,8 @@ def _graph_connected_component(graph, node_id): | |||
for i in indices: | |||
if sparse.issparse(graph): | |||
neighbors = graph[i].toarray().ravel() | |||
elif isinstance(graph, np.matrix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just convert and store as graph at the top of the function?
we would not want an expensive check in the function version. those should
be low overheard to enable reuse in other code.
|
I thought we didn't support np.matrix, but I guess if |
@@ -44,6 +44,8 @@ def _graph_connected_component(graph, node_id): | |||
node | |||
""" | |||
n_node = graph.shape[0] | |||
graph = sparse.csc_matrix(graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right -- it is converted to CSR on the following line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think you are right.
To be compliant with jnothman's suggestion on May 8th about converting and storing it as a graph at the top, I suggest the following actions:
Change line 48 to: graph = sparse.csr_matrix(graph)
Delete lines 49-51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this. @jnothman?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be compliant with jnothman's suggestion on May 8th about converting and storing it as a graph at the top, I suggest the following actions:
What I said was "Can we just convert and store as graph at the top of the function?"
What I meant was "Can we just convert the matrix to an array and assign it back to the `graph variable at the top of the function?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right -- it is converted to CSR on the following line.
Actually, since it must be a symmetric matrix, its row and column matrix representations are equivalent. Thus it was passing the tests.
converting the graph to sparse.csc_matrix was passing the tests, because it must be a symmetric matrix. So its column matrix (csc) and row matrix (csr) are equivalent.
converting the graph to sparse.csc_matrix was passing the tests, because it must be a symmetric matrix. So its column matrix (csc) and row matrix (csr) are equivalent.
a508eb5
to
d4830e0
Compare
Hi @sinanh , thanks for your patience! Are you still interested in working on this PR? If so do you mind fixing conflicts? Thanks! |
292c7e4
to
fabd17d
Compare
78d236c
to
2bdf8ee
Compare
I fixed the conflicts . |
Thanks @sinanh! @jnothman , @glemaitre maybe you might want to have a look? Thanks! |
@@ -44,6 +44,9 @@ def _graph_connected_component(graph, node_id): | |||
node. | |||
""" | |||
n_node = graph.shape[0] | |||
if isinstance(graph, np.matrix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead, can we just pass np.asarray(graph) to _graph_connected_component, instead of passing graph unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman OK, I did that. Also I noticed since numpy 1.15, it is advised not to use np.matrix
. The following is written in the manual:
"It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future."
Hi @sinanh, I guess you closed as you are tired of waiting? I can't do more than relabeling as "waiting for review"... but I wonder if you could reconsider...thanks for your work, anyway. |
@cmarmo thanks, the thing is from numpy 1.15 it is advised not to use np.matrix, so this issue is not an issue anymore I think. I guess both the issue and PR can be closed. |
@sinanh, understood: thanks for clarifying. If you are still interested in working on scikit-learn, feel free to pick one among the "help wanted" issues. Thanks for your time. |
Reference Issues/PRs
Fixes #10993
What does this implement/fix? Explain your changes.
It fixes a small bug in
spectral_embedding
while finding the neighbors when the input array is anp.matrix
.Any other comments?