-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add 'cluster_qr' method to spectral segmentation #21148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
re-introducing #12316
lint fix
trailing space fixed
line too long
added "cluster_qr"
lint fix
added "cluster_qr"
added cluster_qr
E128 continuation line under-indented fixed
The linter is breaking because you need to run
Another way would be to install |
Here are some concerns that have been already raised in the past:
While the two last points need to be fulfilled, I think that we have now @rth do you have any thought regarding the inclusion in |
If there is no way around it, this looks like a bug in CI framework - breaking the ability of purely web-based PR submission and maintenance, which was available prior to requiring this black check. @glemaitre Let's move this topic to #21169 that I have just created. |
@glemaitre @jnothman @ogrisel The original PR was 3 years ago. Since then, the papers that use the clusterQR (I renamed into cluster_qr) have been cited maybe hundred times. The original original report and the paper alone are cited 25 times https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Kv-Y1qYAAAAJ&citation_for_view=Kv-Y1qYAAAAJ:_FxGoFyzp5QC ; see also Our own implementation https://doi.org/10.1109/HPEC.2017.8091045 has won MIT/AWS Graph Challenge 2017 Student Innovation Awards https://graphchallenge.mit.edu/champions. 3 years later, it remains a canonical approach, answering @jnothman, most recently implemented and tested in https://ieeexplore.ieee.org/document/9286181 that has won MIT/AWS Graph Challenge 2020 Innovation Award, https://graphchallenge.mit.edu/champions. What else needs to happed to fulfil the inclusion criterion? |
black formatting
black formatting
trying to change the discretize test by itself to also test kmeans by itself and cluster_qr by itself
adding test cluster_qr by itself to the same test with discretize
error fixes
Happy to include it there, but it looks like it might be hard to integrate while leaving all the rest of
For context, I think historically there is some reluctance to include algorithms proposed by their authors in scikit-learn because there can be a conflict of interest between including some feature in the best interest of users and increased paper citations as a result of it. I'm not saying it's an issue here, and the code certainly look straightforward and short. However, IMO the most compelling argument is still, #21148 (comment)
as in how this would impact the average user. |
error fix
@rth to clarify, I am not an author of the algorithm, but a happy user. In contrast to kmeans, it actually works, which one can observe, e.g., in the included example #12164 (comment) The authors just provided the MATLAB implementation https://github.com/asdamle/QR-spectral-clustering, The algorithm is so simple that turning it from MATLAB into Python is trivial - getting it into sklearn has turned out to be not :-) The authors have provided convincing examples in their paper, using their MATLAB code, but I agree that
so someone will need to write a good benchmark for the present codes. My feeling is that cluster_qr would well outperform the competitors in terms of quality, yet being faster, and probably with a smaller memory footprint. |
cluster_qr apparently requires n_class>2, so change the test to start with n_class=3
reverted to working version
For benchmarking, we can follow the example of https://github.com/glemaitre/bench_lobpcg/blob/master/bench_lobpbcg.py and for the data use examples from the stochastic block model http://graphchallenge.mit.edu/data-sets Please let me know if there are alternative suggestions. If not, someone would probably need to download the files. We could alternatively generate the data on the fly. Please advise and let me know the preference. |
remove comments Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
an author added as suggested
doi + author
added a reference to cluster_qr
lobpcg with amg references added as requested
@jjerphan thanks for your first review! I've made all the requested changes. Please let me know if I need to do anything else. |
' error fix
added the ``"discretize"`` reference
Hi @lobpcg, I am busy with more urgent tasks but will come back to you when possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last suggestions and discussions follow-ups before this LGTM.
We are all doing our best to have the project move forward. We prefer taking time to get familiar with new material to prevent errors both in implementations and in users' understanding of methods and their documentation.
Thanks for your patience, @lobpcg. 🙂
misc editing Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
previous missed suggestions finally committed Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
as proposed in #21148 (comment)
assert labels_float64.dtype == np.int64 naturally failed on 32-bit OS so was removed
@jjerphan all proposed changes finally made (sorry for missing your earlier suggestions). All green, should be ready to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you, @lobpcg!
I would also wait for his approval regarding your latest changes. |
@ogrisel could you please have a look, hopefully final? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a typo that I will fix before merging.
Merged, thank you very much for the contrib! |
I am glad to see this long journey finally completed. Thanks very much everyone involved! |
Reference Issues/PRs
Re-introduces #12316 and closes #12164
What does this implement/fix? Explain your changes.
#12164 adds the new cluster_qr method to 'kmeans' and 'discretize' in spectral clustering
Any other comments?
The actual changes are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py