ENH Add memory=joblib.Memory param to OPTICS #19024

frankier · 2020-12-17T11:54:45Z

Reference Issues/PRs

Addresses, but doesn't not entirely fix #12044

What does this implement/fix? Explain your changes.

This adds a memory parameter to OPTICS. The justification for this is same as for hierarchical clustering. If we want to cut out dendrogram at different points, then we can reuse an intermediate result. The same is true for DBSCAN.

Practically speaking this avoids recomputation of the most expensive step of OPTICS during grid search as long as it is performed in serial.

As discussed in #12044 it would be good to eventually implement warm_start(...) and then users could run that in parallel as an initial step, before running grid search. A further embellishment would be grid searchers that understood the expensive and non expensive step computed things in the right order.

However, that issue has been open for a while now, and this at least solves the serial grid search use-case.

Any other comments?

Let me know if anything is missing.

jnothman

Looks good!

Please report benchmarks just to show it has the desired effect.

frankier · 2021-03-17T13:55:19Z

I've added a benchmark now but I'm having trouble running it.

$ asv run -E virtualenv -b OPTICSGridSearch HEAD

· Creating environments
· Discovering benchmarks
· No benchmarks selected

But this works:

$ asv run -E virtualenv -b LogisticRegression HEAD

Any tips?

jnothman · 2021-03-17T22:43:57Z

I wasn't necessarily expecting benchmarks to be committed here, but at least to be presented. Thanks. I'd need to look into the asv issues with time I don't have right now!

rth · 2021-08-06T10:01:05Z

Quick benchmark adapted from the asv example,

from sklearn.datasets import make_blobs
from sklearn.cluster import OPTICS
from sklearn.model_selection import GridSearchCV

X, y = make_blobs(n_samples=500)
optics = OPTICS(
    metric="euclidean",
    cluster_method="dbscan",
    min_samples=10,
)

Then,

In [20]: %timeit GridSearchCV(optics, {"eps": [0.1, 0.2, 0.3, 0.4, 0.5]}, scoring=lambda x, y: 1.0).fit(X)
3.11 s ± 27.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [21]: %timeit GridSearchCV(optics, {"eps": [0.1, 0.2, 0.3, 0.4, 0.5], "memory": ['/tmp/cache2']}, scoring=lambda x, y: 1.0).fit(X)
22.1 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

So this works as expected. I removed the asv benchmark as I don't think it would provide useful information for the future. All we want to know is that caching works as expected.

rth

LGTM, otherwise. Thanks @frankier !

adrinjalali · 2021-08-06T11:49:02Z

Nice!

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Add memory=joblib.Memory param to OPTICS

9fcdf31

github-actions bot added the module:cluster label Dec 17, 2020

Base automatically changed from master to main January 22, 2021 10:53

Merge branch 'main' into optics-memory

f324074

jnothman approved these changes Mar 17, 2021

View reviewed changes

Add a benchmark for OPTICS grid search

2d02358

rth added 3 commits August 6, 2021 11:25

Merge branch 'main' into optics-memory

c0b103f

Fix black

a813780

Remove benchmark

e414bd8

Add changelog entry

b077a0f

rth approved these changes Aug 6, 2021

View reviewed changes

rth changed the title ~~Add memory=joblib.Memory param to OPTICS~~ ENH Add memory=joblib.Memory param to OPTICS Aug 6, 2021

rth merged commit ec941a8 into scikit-learn:main Aug 6, 2021

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

ENH Add memory=joblib.Memory param to OPTICS (scikit-learn#19024)

b171de9

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Add memory=joblib.Memory param to OPTICS #19024

ENH Add memory=joblib.Memory param to OPTICS #19024

Uh oh!

frankier commented Dec 17, 2020

Uh oh!

jnothman left a comment

Uh oh!

frankier commented Mar 17, 2021

Uh oh!

jnothman commented Mar 17, 2021

Uh oh!

rth commented Aug 6, 2021

Uh oh!

rth left a comment

Uh oh!

adrinjalali commented Aug 6, 2021

Uh oh!

Uh oh!

Uh oh!

ENH Add memory=joblib.Memory param to OPTICS #19024

ENH Add memory=joblib.Memory param to OPTICS #19024

Uh oh!

Conversation

frankier commented Dec 17, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

frankier commented Mar 17, 2021

Uh oh!

jnothman commented Mar 17, 2021

Uh oh!

rth commented Aug 6, 2021

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Aug 6, 2021

Uh oh!

Uh oh!