Adding adaptive learning rate for MinibatchKmeans #30045

BenJourdan · 2024-10-11T09:03:53Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This request implements a recent learning rate for minibatch k-means which can be superior to the default learning rate. We implement this with the flag adaptive_lr that defaults to false.

Details can be found in this paper that appeared in ICLR 2023. Extensive experiments can be found in this manuscript - ignore the kernel k-means results. We also added a benchmark that produces the following plot which shows the learning rate is the same or better than the default on dense datasets.

Any other comments?

This is a reasonably small code change. We add a flag to the MinibatchKmeans constructor and the _k_means_minibatch.pyx cython file. The learning rate implementation is straightforward. In the benchmarks, it appears to take a few more iterations for the adaptive learning rate to converge, often resulting in better solutions. When we removed early stopping we observed the running time is about the same.

feature. Added data and result to gitignore.

fixed bug in newlr added various params for benchmarking

the same.

_minibatch_update_dense

…29977)

…-learn#29938)

…#30018) Co-authored-by: Lock file bot <noreply@github.com>

…30020) Co-authored-by: Lock file bot <noreply@github.com>

…learn#29992)

…ifierCV` (scikit-learn#29634) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>

Co-authored-by: Lock file bot <noreply@github.com>

…earn#30017) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

…earn#30014) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

scikit-learn#29998)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

…ecate algorithm (scikit-learn#29997) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

…cikit-learn#29996) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

…30019) Co-authored-by: Lock file bot <noreply@github.com> Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…egression trees (scikit-learn#30035)

Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

…ta_routing=True) (scikit-learn#30038)

…eep_empty_features=True (scikit-learn#29779) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

github-actions · 2024-10-11T09:05:16Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 35ebd8d. Link to the linter CI: here}

gregoryschwartzman and others added 30 commits October 7, 2024 15:36

Initial commit. bench for benchmarking the new

553a5cb

feature. Added data and result to gitignore.

implement new lr for dense version

f033ad7

fixed gitignore

da275d5

fixed bug in newlr added various params for benchmarking

Fixed plotting code

a8dc79a

Fixed speed issues. Now the running time is about

4705abd

the same.

Moved the computation of b to

692e72a

_minibatch_update_dense

Added wsum update to new_lr. need for other logic.

a9beb1e

Added sparse code and sparse benchmark

c819710

moved bench to bench folder and added documentation

f4f09f7

fixed order of adaptive_lr across call sites

b7a8ab1

finished documentation and bench changes

6a96e6f

MAINT Avoid RuntimeWarning about invalid value in cast (scikit-learn#…

8bb1915

…29977)

FIX check_transformer_data_not_an_array for ColumnTransformer (scikit…

d53656e

…-learn#29938)

🔒 🤖 CI Update lock files for cirrus-arm CI build(s) 🔒 🤖 (scikit-learn…

6703caa

…#30018) Co-authored-by: Lock file bot <noreply@github.com>

🔒 🤖 CI Update lock files for array-api CI build(s) 🔒 🤖 (scikit-learn#…

c5fb5ea

…30020) Co-authored-by: Lock file bot <noreply@github.com>

CI Use bot token for adding CUDA CI label (scikit-learn#29994)

ff56747

DOC Fixed git documentation link (scikit-learn#29946)

36c8b0a

MAINT Clean up deprecations for 1.6: squared param of MS(L)E (scikit-…

4d64b77

…learn#29992)

FIX add metadata routing to CV splitters in RidgeCV and `RidgeClass…

39a3ca7

…ifierCV` (scikit-learn#29634) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>

🔒 CI Update lock files for main CI build(s) 🔒 (scikit-learn#30026)

0c848d3

Co-authored-by: Lock file bot <noreply@github.com>

MAINT Clean up deprecations for 1.6: in HGBT (scikit-learn#30002)

b0a6717

DOC Add wikipedia principal eigenvector example to API docs (scikit-l…

64afc6b

…earn#30017) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

DOC Add link to FeatureAgglomeration examples in docstrings (scikit-l…

a06c169

…earn#30014) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

MAINT Clean up deprecations for 1.6: fit_params (scikit-learn#29999)

1c1f534

DOC Remove mention of deprecated multi_class in LogisticRegression (

3dac188

scikit-learn#29998)

DOC fix docstring of _validate_shuffle_split (scikit-learn#30034)

f4cfaef

MAINT Handle deprecation of sokalmichener metric (scikit-learn#30004)

48bd262

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC improve conventions used in MAPE (scikit-learn#30012)

86675b1

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

MAINT|API Clean up deprecations for 1.6: SAMME.R in AdaBoost and depr…

d379bad

…ecate algorithm (scikit-learn#29997) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

DOC: add link plot stock market to manifold learning documentation (s…

a99632c

…cikit-learn#29996) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

kbharat1210 and others added 9 commits October 11, 2024 08:41

DOC add link to plot_learning_curve (scikit-learn#29990)

7927c8f

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 (scikit-learn#…

a6e685a

…30019) Co-authored-by: Lock file bot <noreply@github.com> Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

BLD Adds runtime lincense to wheels (scikit-learn#29861)

3cfe304

DOC fix docstring of RandomForestClassifier stating dependance from r…

c7f943b

…egression trees (scikit-learn#30035)

Bump the actions group with 3 updates (scikit-learn#29981)

ed48b2e

Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

DOC add link plot_species_distribution_modeling (scikit-learn#29985)

c68a5e0

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

MAINT replace enable_slep006 fixture by @config_context(enable_metada…

84d2879

…ta_routing=True) (scikit-learn#30038)

FIX make sure IterativeImputer does not skip iterative process when k…

9a984fb

…eep_empty_features=True (scikit-learn#29779) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>

Merge branch 'scikit-learn:main' into feature_mbkm_newlr

7f6b2f4

github-actions bot added module:cluster cython labels Oct 11, 2024

BenJourdan added 8 commits October 11, 2024 09:12

formatted files to pep8

793ae43

fixing black/ruff errors

2b79f42

Merge branch 'main' into feature_mbkm_newlr

c89c752

fixed cython-lint errors

ce21400

fixing docstring test failure

ad48042

fixed linting docstring error

8eb4ee9

Merge branch 'main' into feature_mbkm_newlr

72b9c96

Merge branch 'main' into feature_mbkm_newlr

35ebd8d

BenJourdan closed this Oct 12, 2024

BenJourdan mentioned this pull request Oct 12, 2024

adding adaptive learning rate for minibatch k-means #30051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adding adaptive learning rate for MinibatchKmeans #30045

Adding adaptive learning rate for MinibatchKmeans #30045

Uh oh!

BenJourdan commented Oct 11, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Oct 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Adding adaptive learning rate for MinibatchKmeans #30045

Adding adaptive learning rate for MinibatchKmeans #30045

Uh oh!

Conversation

BenJourdan commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

BenJourdan commented Oct 11, 2024 •

edited

Loading

github-actions bot commented Oct 11, 2024 •

edited

Loading