Skip to content

[MRG] clusterQR method added to spectral segmentation #12316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 55 commits into from
Closed

[MRG] clusterQR method added to spectral segmentation #12316

wants to merge 55 commits into from

Conversation

lobpcg
Copy link
Contributor

@lobpcg lobpcg commented Oct 6, 2018

Reference Issues/PRs

closes #12164

What does this implement/fix? Explain your changes.

#12164 adds clusterQR method to 'kmeans' and 'discretize' in spectral clustering

Any other comments?

The actual changes for #12164 are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py

@eamanu
Copy link
Contributor

eamanu commented Oct 6, 2018

@lobpcg Please add [WIP] tag to name

@eamanu
Copy link
Contributor

eamanu commented Oct 6, 2018

@lobpcg look in the sepectral_clustering docs.

In the line where Parameters are defined. I think that the Travis error is because the ---- below Parameters title is longer than the title

@lobpcg
Copy link
Contributor Author

lobpcg commented Oct 6, 2018

@eamanu - thanks for your suggestions, now all implemented.

@lobpcg look in the sepectral_clustering docs.

In the line where Parameters are defined. I think that the Travis error is because the ---- below Parameters title is longer than the title

This was a typo indeed, now fixed - @eamanu thanks for noticing! The fix has however not affected the Travis error, still present unfortunately... Any other advice, please?

Since the plot_coin_segmentation example demo is involved, I have added the modified fine scikit-learn\doc\modules\clustering.rst to this PR.

I am unsure where to put the generated figures
auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_001.png
auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_002.png
and a new corresponding to clusterQR
auto_examples/cluster/images/sphx_glr_plot_coin_segmentation_003.png
I attach them here
sphx_glr_plot_coin_segmentation_001
sphx_glr_plot_coin_segmentation_002
sphx_glr_plot_coin_segmentation_003

@lobpcg lobpcg changed the title clusterQR method added to spectral segmentation [WIP] clusterQR method added to spectral segmentation Oct 6, 2018
@lobpcg lobpcg changed the title [WIP] clusterQR method added to spectral segmentation clusterQR method added to spectral segmentation Oct 10, 2018
@eamanu
Copy link
Contributor

eamanu commented Oct 11, 2018

If you think that is ready please add the tag [MRG]

@lobpcg lobpcg changed the title clusterQR method added to spectral segmentation [MRG] clusterQR method added to spectral segmentation Oct 11, 2018
Copy link
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good for me

@lobpcg
Copy link
Contributor Author

lobpcg commented Oct 30, 2018

This PR has been completed. Could someone please make more comments to react to or just commit it to the base? I start already forgetting my own proposed changes in this PR...

@lobpcg
Copy link
Contributor Author

lobpcg commented Dec 5, 2018

@eamanu This PR has been finalized over a month ago. Could someone please approve it or make new suggestions?

@rth
Copy link
Member

rth commented Dec 5, 2018

@lobpcg Thanks for the PR, but as was mentioned in the original issue #12164 by 2 maintainers, this algorithm does not meet the inclusion criteria https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms and would be more suited as a scikit-learn-contrib project until it gains wider usage and could be considered for inclusion in scikit-learn.

@lobpcg
Copy link
Contributor Author

lobpcg commented Dec 5, 2018

The actual changes are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py That hardly justifies, in MHO, creating a brand new separate project at http://contrib.scikit-learn.org/ as proposed by @jnothman and @ogrisel . My hope has been that @jnothman and @ogrisel might change their recommendation, after seeing the actual changes in the code and results, waiving some of the inclusion criteria due to rather trivial changes in the code and noticeable improvements in performance... I have asked @jnothman and @ogrisel in the original issue #12164 almost 2 months ago, but got no response.

@jnothman
Copy link
Member

jnothman commented Dec 9, 2018

The actual changes are just a few lines in only 3 core codes, spectral.py, test_spectral.py, and plot_coin_segmentation.py That hardly justifies, in MHO, creating a brand new separate project at http://contrib.scikit-learn.org as proposed by @jnothman and @ogrisel .

We simply can't use that justification regularly. Methods for solving optimisation problems change frequently or vary for different applications and we cannot afford to be a complete library of them. This is why we have criteria for maturity. But perhaps if it is highly valuable, we can make spectral segmentation more customisable by allowing a function to be passed instead of one of the built-in solvers?

@lobpcg
Copy link
Contributor Author

lobpcg commented Dec 10, 2018

I surely understand and share your general concerns and the requirements for maturity. However, please consider the following:

  1. I have done all the needed code writing already in this PR, as far as I can see,
  2. the changes are very small, only several lines of code, so there is just tiny additional maintenance effort,
  3. the new function is not the default in this PR so the code is fully backward compatible,
  4. the new clusterQR is not yet another optimization method, but rather direct extraction of clusters from eigenvectors in spectral clustering, designed specifically for that purpose only,
  5. in contrast to the existing 'kmeans' and 'discretize', the new clusterQR has no tuning parameters, e.g., runs no iterations,
  6. yet the new clusterQR seems to consistently outperform both 'kmeans' and 'discretize' in terms of quality and speed, e.g., see the example above.

While not technically satisfying yet the criteria for maturity, the new clusterQR appears to set the new standard for extraction of clusters from eigenvectors in spectral clustering, that seems to me to be difficult to improve. It thus deserves in the future to become the default method instead of kmeans after intensive testing by users, that hopefully should follow if this PR is merged and released.

master merge with destro latest
merge just updated local master into clusterQR
@lobpcg
Copy link
Contributor Author

lobpcg commented Aug 5, 2019

@GaelVaroquaux could you please have a look at this PR and express your opinion?

@ogrisel
Copy link
Member

ogrisel commented Aug 29, 2019

The AMG fix #13707 is now merged in master.

@lobpcg
Copy link
Contributor Author

lobpcg commented Aug 30, 2019

Could someone please review and finally decide if this dies or gets merged?

@lobpcg
Copy link
Contributor Author

lobpcg commented Sep 26, 2019

Could someone please review?

@agramfort
Copy link
Member

in the referenced issue I see:

This doesn't meet our basic criteria for inclusion of stable and mature algorithms. What makes you think it is worth our while to maintain an implementation of this? What are the chances that this will remain a canonical approach in 5 years' time?
+1 for making a prototype Python implementation outside of the scikit-learn code base and running some benchmarks.

so there are some concerns which could eventually be ruled out with some benchmarks showing the empirical superiority of the method. Have these benchmarks done and made public?

@lobpcg
Copy link
Contributor Author

lobpcg commented Sep 26, 2019

in the referenced issue I see:

This doesn't meet our basic criteria for inclusion of stable and mature algorithms. What makes you think it is worth our while to maintain an implementation of this? What are the chances that this will remain a canonical approach in 5 years' time?
+1 for making a prototype Python implementation outside of the scikit-learn code base and running some benchmarks.

so there are some concerns which could eventually be ruled out with some benchmarks showing the empirical superiority of the method. Have these benchmarks done and made public?

@agramfort Yes, of course, please see the plots in #12316 (comment) and/or run the plot_coin_segmentation example demo in this PR yourself. A systematic comparison is performed in the original paper, cited in #12164

@agramfort
Copy link
Member

agramfort commented Sep 26, 2019 via email

@lobpcg
Copy link
Contributor Author

lobpcg commented Sep 26, 2019

the plot_coin_segmentation example demo in this PR
tells me the code leads to comparable results.

Comparable, but better - please just look at the plots.

have a script to replicate the timings given the current implementation is necessary. see for example the benchmarks https://github.com/scikit-learn/scikit-learn/tree/master/benchmarks that have been shared with the PRs to demonstrate and replicate timings on different problem dimensions and hardware.

Sure, if this is a blocker, it is doable and probably not too difficult. I'll look into it - thanks for the link!

@cmarmo cmarmo added Needs Benchmarks A tag for the issues and PRs which require some benchmarks and removed Waiting for Reviewer labels Jan 6, 2021
Base automatically changed from master to main January 22, 2021 10:50
@lobpcg
Copy link
Contributor Author

lobpcg commented Sep 25, 2021

This PR is stalled and has conflicts. I close it and open instead [WIP] cluster_qr method added to spectral segmentation #21148

@lobpcg lobpcg closed this Sep 25, 2021
@lobpcg lobpcg deleted the clusterQR branch September 25, 2021 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:cluster Needs Benchmarks A tag for the issues and PRs which require some benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new feature: add clusterQR method to 'kmeans' and 'discretize' in spectral clustering
10 participants