adding adaptive learning rate for minibatch k-means #30051

BenJourdan · 2024-10-12T14:53:24Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This request implements a recent learning rate for minibatch k-means which can be superior to the default learning rate. We implement this with the flag adaptive_lr that defaults to false.

Details can be found in this paper that appeared in ICLR 2023. Extensive experiments can be found in this manuscript - ignore the kernel k-means results. We also added a benchmark that produces the following plot which shows the learning rate is the same or better than the default on dense datasets.

Any other comments?

This is a reasonably small code change. We add a flag to the MinibatchKmeans constructor and the _k_means_minibatch.pyx cython file. The learning rate implementation is straightforward. In the benchmarks, it appears to take a few more iterations for the adaptive learning rate to converge, often resulting in better solutions. When we removed early stopping we observed the running time is about the same.

This should be a cleaner version of #30045 (I made a mess since I'm still pretty new to git).

github-actions · 2024-10-12T14:54:40Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 68b8309. Link to the linter CI: here}

adrinjalali · 2024-10-14T10:44:37Z

The diff is indeed relatively small, however, the paper is quite recent, and the improvements are marginal. So I'll let @ogrisel , @lorentzenchr , @jeremiedbb, and @GaelVaroquaux weigh in here.

ogrisel · 2024-10-14T12:50:21Z

I have similar feelings. Unfortunately, arxiv.org seems to be unresponsive since yesterday for me. I cannot check the benchmark results from the paper.

@BenJourdan could you please add results for full-batch k-means to your plots? I am wondering if this can allow MB-k-means to reach the same scores as full-batch k-means on those problems.

BenJourdan · 2024-10-14T15:42:24Z

Here are the results with full-batch k-means added:

If you mess around with the early stopping condition tol, this also affects runtime/performance. It's not exactly apples to apples to compare tol values between the mini-batch and full-batch methods but I imagine it's what users may reach for first if they are worried about runtime. max_no_improvement will also have an effect.

This was with tol=1e-1 for all the algorithms:

This was with tol=1e-2:

tol=1e-3:

tol=1e-4:

I can add more experiments varying max_no_improvement if that helps.

ogrisel · 2024-10-17T12:44:36Z

Thanks for the update. So from those experiments, it appears that the new lr scheme can empirically help MBKMeans close the (smallish) gap with full-batch KMeans in terms of clustering quality while keeping favorable runtimes for datasets with many data points (e.g. MNIST size or larger).

But since the method was recently published, this PR does not technically meet our inclusion criteria, although we could be less strict in cases where this is an incremental improvement of an existing method implemented in scikit-learn.

I will mention this PR at our next monthly meeting.

BenJourdan · 2024-10-31T10:07:46Z

What was the verdict @ogrisel?

adrinjalali · 2024-10-31T11:00:33Z

There were no clear objections to include this, and I think a few of us are in favor of including it.

BenJourdan · 2024-11-06T13:22:37Z

@ogrisel @adrinjalali what happens next? Should I start updating the branch?

adrinjalali · 2024-11-06T15:57:54Z

@BenJourdan seems like it.

BenJourdan · 2025-01-26T23:11:04Z

Should I keep updating the branch until (if lol) someone gets assigned? Not sure what the convention is.

gregoryschwartzman · 2025-04-20T23:14:00Z

Hi @ogrisel, @adrinjalali,

Just checking in—since there were no objections during the meeting and some support for inclusion, would it make sense to remove the Needs Decision label and move toward review/approval?

Please let me know what you'd recommend as the next step. Thanks again for your time and feedback so far!

adrinjalali · 2025-04-23T12:14:30Z

I've removed "need decision" here. I think we can move forward with this.

gregoryschwartzman · 2025-04-23T12:51:25Z

Great! Let us know if we should do anything on our end.

adrinjalali · 2025-04-23T12:53:08Z

@antoinebaker would you mind having a look at this PR for a review?

antoinebaker

Thanks for the PR @BenJourdan ! Here a first round of review.

sklearn/cluster/_kmeans.py

sklearn/cluster/_k_means_minibatch.pyx

antoinebaker · 2025-04-29T14:28:28Z

I feel the code could be simplified by first defining a learning rate:

if adaptive_lr:
    lr = sqrt(wsum / wsum_batch)
else:
    lr = wsum / (weight_sums[cluster_idx] + wsum)

and then do the common updates

for feature_idx in range(n_features):
    centers_new[cluster_idx, feature_idx] = (1 - lr) * centers_old[cluster_idx, feature_idx]
for k in range(n_indices):
    sample_idx = indices[k]
    for feature_idx in range(n_features):
        weight_idx =  sample_weight[sample_idx] / wsum
        centers_new[cluster_idx, feature_idx] += lr * weight_idx * X[sample_idx, feature_idx]

BenJourdan

Thanks for the feedback. I'll add most of your suggestions as are.

Introducing the learning rate to avoid duplicating code is a good idea. However, it's a bit messy since we need to do an optimization that avoids explicitly computing the means of each batch for a given center. I'll have a go at redrafting those parts.

Thanks!

sklearn/cluster/_k_means_minibatch.pyx

sklearn/cluster/_kmeans.py

adrinjalali · 2025-05-06T12:59:32Z

@BenJourdan please (at almost all cost) avoid force pushing here. It makes review harder.

BenJourdan · 2025-05-06T13:19:43Z

Sure, sorry about that. I'm still new to using Git.

I see my mistake was rebasing after getting the review. Sorry about that @antoinebaker.

antoinebaker · 2025-05-28T14:55:40Z

Introducing the learning rate to avoid duplicating code is a good idea. However, it's a bit messy since we need to do an optimization that avoids explicitly computing the means of each batch for a given center. I'll have a go at redrafting those parts.

Could you elaborate why the suggestion in #30051 (comment) is less efficient ? I'm not sure I get what you mean by "avoids explicitly computing the means of each batch for a given center".

I think it's done in both cases by accumulating inplace the updates. In the above suggestion:

for k in range(n_indices):
    sample_idx = indices[k]
    for feature_idx in range(n_features):
        weight_idx =  sample_weight[sample_idx] / wsum
        centers_new[cluster_idx, feature_idx] += lr * weight_idx * X[sample_idx, feature_idx]

while in main

scikit-learn/sklearn/cluster/_k_means_minibatch.pyx

Lines 92 to 96 in a6c2db0

    
           # Update cluster with new point members 
        
           for k in range(n_indices): 
        
               sample_idx = indices[k] 
        
               for feature_idx in range(n_features): 
        
                   centers_new[cluster_idx, feature_idx] += X[sample_idx, feature_idx] * sample_weight[sample_idx]

sklearn/cluster/_k_means_minibatch.pyx

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

BenJourdan · 2025-05-29T13:11:03Z

Introducing the learning rate to avoid duplicating code is a good idea. However, it's a bit messy since we need to do an optimization that avoids explicitly computing the means of each batch for a given center. I'll have a go at redrafting those parts.

Could you elaborate why the suggestion in #30051 (comment) is less efficient ? I'm not sure I get what you mean by "avoids explicitly computing the means of each batch for a given center".

I think it's done in both cases by accumulating inplace the updates. In the above suggestion:
for k in range(n_indices):
    sample_idx = indices[k]
    for feature_idx in range(n_features):
        weight_idx =  sample_weight[sample_idx] / wsum
        centers_new[cluster_idx, feature_idx] += lr * weight_idx * X[sample_idx, feature_idx]
while in main

scikit-learn/sklearn/cluster/_k_means_minibatch.pyx

Lines 92 to 96 in a6c2db0

# Update cluster with new point members

for k in range(n_indices):

sample_idx = indices[k]

for feature_idx in range(n_features):

centers_new[cluster_idx, feature_idx] += X[sample_idx, feature_idx] * sample_weight[sample_idx]

You were right to question this! I benchmarked your suggestion with what we had and it was the same if not marginally faster. Originally, we tried something similar but kept seeing a speed penalty. I think this was probably because we tried doing something more similar to the explicit scaling and rescaling that main was doing beforehand:

scikit-learn/sklearn/cluster/_k_means_minibatch.pyx

Lines 88 to 108 in a6c2db0

    
               # Undo the previous count-based scaling for this cluster center 
        
               for feature_idx in range(n_features): 
        
                   centers_new[cluster_idx, feature_idx] = centers_old[cluster_idx, feature_idx] * weight_sums[cluster_idx] 
        
               # Update cluster with new point members 
        
               for k in range(n_indices): 
        
                   sample_idx = indices[k] 
        
                   for feature_idx in range(n_features): 
        
                       centers_new[cluster_idx, feature_idx] += X[sample_idx, feature_idx] * sample_weight[sample_idx] 
        
               # Update the count statistics for this center 
        
               weight_sums[cluster_idx] += wsum 
        
               # Rescale to compute mean of all points (old and new) 
        
               alpha = 1 / weight_sums[cluster_idx] 
        
               for feature_idx in range(n_features): 
        
                   centers_new[cluster_idx, feature_idx] *= alpha 
        
           else: 
        
               # No sample was assigned to this cluster in this batch of data 
        
               for feature_idx in range(n_features): 
        
                   centers_new[cluster_idx, feature_idx] = centers_old[cluster_idx, feature_idx]

I've switched both dense and sparse versions over to using your suggestion and renamed lr to alpha.

antoinebaker · 2025-06-02T07:32:22Z

Hello @BenJourdan could you mark as resolved the comments/suggestions up to May 1st , which are obsolete because of the force push ? They are for now only marked as "Outdated" and marking them as resolved will make the PR discussion easier to follow I think. Thanks !

BenJourdan · 2025-06-15T10:24:06Z

Hello @BenJourdan could you mark as resolved the comments/suggestions up to May 1st , which are obsolete because of the force push ? They are for now only marked as "Outdated" and marking them as resolved will make the PR discussion easier to follow I think. Thanks !

@antoinebaker Think I've marked everything you said as resolved now. I still see the outdated tags but there's a "show resolved" tag next to them.

antoinebaker · 2025-06-16T07:35:54Z

There seem to be cython linting issues:

/home/circleci/project/sklearn/cluster/_k_means_minibatch.pyx:136:29: E222 multiple spaces after operator
Problems detected by cython-lint, please fix them

sklearn/cluster/_k_means_minibatch.pyx

BenJourdan · 2025-06-20T08:41:30Z

@antoinebaker I've fixed the linting issue and shortened the comment.

antoinebaker

Thanks for the changes @BenJourdan ! Here a couple of nitpicks, mostly linting (which cython linter doesn't seem to care about).

sklearn/cluster/_k_means_minibatch.pyx

sklearn/cluster/_kmeans.py

antoinebaker · 2025-06-26T10:37:10Z

We also added a benchmark that produces the following plot which shows the learning rate is the same or better than the default on dense datasets.

Well it seems from the plot that there is no significative improvement (all within the error bars). Do you have a example dataset where the adaptive_lr is useful ?

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

BenJourdan · 2025-06-27T16:35:10Z

We also added a benchmark that produces the following plot which shows the learning rate is the same or better than the default on dense datasets.

Well it seems from the plot that there is no significative improvement (all within the error bars). Do you have a example dataset where the adaptive_lr is useful ?

@antoinebaker The error bars in our benchmark were showing standard deviation. To make the statistical significance a bit more obvious, here are some violin plots for the same experiments over a lot more runs. The red line shows the median of each metric. Does this make things a bit more clear?

github-actions bot added module:cluster cython labels Oct 12, 2024

adrinjalali added the Needs Decision Requires decision label Oct 14, 2024

BenJourdan force-pushed the feature_mbkm_adaptive_lr branch from 35c5b19 to 158897a Compare October 14, 2024 15:47

BenJourdan force-pushed the feature_mbkm_adaptive_lr branch from 65e49cb to 320a4b4 Compare November 6, 2024 16:06

adrinjalali removed the Needs Decision Requires decision label Apr 23, 2025

antoinebaker reviewed Apr 29, 2025

View reviewed changes

BenJourdan commented May 1, 2025

View reviewed changes

BenJourdan force-pushed the feature_mbkm_adaptive_lr branch 4 times, most recently from 0bf05ba to 0094c8e Compare May 1, 2025 15:41

adding adaptive learning rate for minibatch k-means

13fe041

BenJourdan force-pushed the feature_mbkm_adaptive_lr branch from 0094c8e to 13fe041 Compare May 1, 2025 15:46

BenJourdan requested a review from antoinebaker May 7, 2025 10:30

antoinebaker reviewed May 28, 2025

View reviewed changes

sklearn/cluster/_k_means_minibatch.pyx Outdated Show resolved Hide resolved

sklearn/cluster/_k_means_minibatch.pyx Outdated Show resolved Hide resolved

BenJourdan and others added 3 commits May 29, 2025 13:37

Update sklearn/cluster/_k_means_minibatch.pyx

d32a2b3

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Update sklearn/cluster/_k_means_minibatch.pyx

4a8b250

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

simplifying update logic and comments

a7f0c7e

antoinebaker reviewed Jun 16, 2025

View reviewed changes

sklearn/cluster/_k_means_minibatch.pyx Outdated Show resolved Hide resolved

fixed cython lint and shortened comment

4c795d2

antoinebaker reviewed Jun 26, 2025

View reviewed changes

BenJourdan and others added 2 commits June 27, 2025 17:10

Update sklearn/cluster/_k_means_minibatch.pyx

ab9dd02

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Apply suggestions from code review

68b8309

Co-authored-by: antoinebaker <antoinebaker@users.noreply.github.com>

Uh oh!

adding adaptive learning rate for minibatch k-means #30051

Are you sure you want to change the base?

adding adaptive learning rate for minibatch k-means #30051

Uh oh!

Conversation

BenJourdan commented Oct 12, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Oct 14, 2024

Uh oh!

ogrisel commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenJourdan commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenJourdan commented Oct 31, 2024

Uh oh!

adrinjalali commented Oct 31, 2024

Uh oh!

BenJourdan commented Nov 6, 2024

Uh oh!

adrinjalali commented Nov 6, 2024

Uh oh!

BenJourdan commented Jan 26, 2025

Uh oh!

gregoryschwartzman commented Apr 20, 2025

Uh oh!

adrinjalali commented Apr 23, 2025

Uh oh!

gregoryschwartzman commented Apr 23, 2025

Uh oh!

adrinjalali commented Apr 23, 2025

Uh oh!

antoinebaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Apr 29, 2025

Uh oh!

BenJourdan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali commented May 6, 2025

Uh oh!

BenJourdan commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenJourdan commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker commented Jun 2, 2025

github-actions bot commented Oct 12, 2024 •

edited

Loading

ogrisel commented Oct 14, 2024 •

edited

Loading

BenJourdan commented Oct 14, 2024 •

edited

Loading

ogrisel commented Oct 17, 2024 •

edited

Loading

BenJourdan left a comment •

edited

Loading

BenJourdan commented May 6, 2025 •

edited

Loading

antoinebaker commented May 28, 2025 •

edited

Loading

BenJourdan commented May 29, 2025 •

edited

Loading