[MRG] Refactor MiniBatchDictionaryLearning and add stopping criterion #18975

jeremiedbb · 2020-12-07T12:53:07Z

Currently MiniBatchDictionaryLearning calls dict_learning_online. This PR switches to make dict_learning_online call the class instead.

2 main reasons for this:

currently dict_learning_online serves too many purposes and exposes private stuff useful for partial_fit like iter_offsetand inner_stats. It makes the code very hard to follow. For instance the function has 6 possible return statements, see

scikit-learn/sklearn/decomposition/_dict_learning.py

Line 866 in 4773f3e

if return_inner_stats:
It will greatly ease the implementation of [WIP] online matrix factorization with missing values #18492.

There are 2 options to do this: the first one (proposed here) is to make the function call the class; The other one is to make both the function and the class call a common new private function. I chose the former because it's what has already been done in a few other places like #14985 and #14994.

ogrisel · 2020-12-09T10:29:37Z

I am ok with the general idea of this refactoring that is likely to improve maintainability of this estimator. The fact that dict_learning_online has state related arguments is a clue that it should fundamentally be a method on a class.

jeremiedbb · 2020-12-09T16:18:37Z

For dict_learning_online I decided for now to deprecate

inter_offset, inner_stats and return_inner_stats. Those are only useful for partial fit and only serve private purpose.
return_n_iter. I'm pretty sure it was introduce to not break backward compat but there's no reason to not return n_iter.

For MiniBatchDictionaryLearning I decided to deprecate

iter_offset_, inner_stats_ and random_state_ attributes and made them private.
iter_offset in partial_fit. I don't see the point of this parameter. We don't have it for other online estmators.

Are you ok with that ?

jeremiedbb · 2020-12-10T11:53:10Z

sklearn/decomposition/_dict_learning.py

@@ -1509,7 +1587,7 @@ class MiniBatchDictionaryLearning(_BaseSparseCoding, BaseEstimator):
    We can check the level of sparsity of `X_transformed`:

    >>> np.mean(X_transformed == 0)
-    0.87...
+    0.85...


There's a small difference due to what I think should be considered a bug with the previous behavior. The update of the inner stats depends on the batch size see https://github.com/scikit-learn/scikit-learn/pull/18975/files#diff-20a73e7d385ab5d19a05026b635c8b256ff568e1a0e9e2fed606fec82d3b956fR1732

The thing is that due to how batches are generated, the batches may not all have the same size:

batches = gen_batches(n_samples, self.batch_size) batches = itertools.cycle(batches)

if n_samples is not a multiple of the batches, the last batch will be smaller than batch_size. In this PR I currently just use the correct size of the batch for the update of the stats, hence the small difference.

Actually I wonder if we should do something about the generation of the batches, to make them all have the same size. wdyt ?

…call-class

ogrisel · 2022-02-15T19:30:03Z

I pushed a commit to tweak the parameters of the denoising example to get a better dictionary that leads to cleaner denoising results while still being fast enough.

ogrisel · 2022-02-15T20:06:12Z

I am not sure why black has started to complain. According to our doc we should use the pinned black==21.6b0 version but this is not the case on this build.

jeremiedbb · 2022-02-15T20:07:01Z

We upgraded to a stable version #22474

ogrisel · 2022-02-15T20:07:16Z

Just saw that :)

glemaitre

Posting those reviews but this is a partial review. I will finish it now.

examples/decomposition/plot_image_denoising.py

sklearn/decomposition/_sparse_pca.py

sklearn/linear_model/_coordinate_descent.py

sklearn/utils/estimator_checks.py

sklearn/decomposition/_dict_learning.py

doc/whats_new/v1.1.rst

glemaitre

LGTM

ogrisel

LGTM! Thank you very much for the clean-up.

…scikit-learn#18975) Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

dict learning onlinefunc uses class instead of the opposite

295c131

github-actions bot added the module:decomposition label Dec 7, 2020

jeremiedbb added 5 commits December 9, 2020 16:35

deprecate a lot of stuff

8d383a8

lint

4db4917

lint

8ec1069

lint again

c3b935d

plz lint be nice

e856bc3

jeremiedbb added 2 commits December 10, 2020 12:32

add tests

c0e1fc3

fix doc example

b2efe08

jeremiedbb commented Dec 10, 2020

View reviewed changes

jeremiedbb added 5 commits December 10, 2020 12:54

lint

48f7756

fix broken tests

02dc2d7

get rid of iter_offset

27e457e

better document test

fb91c8d

cln

bd3bb2d

jeremiedbb changed the title ~~[WIP] Make dict_learning_online call MiniBatchDictionaryLearning instead of the opposite~~ [MRG] Make dict_learning_online call MiniBatchDictionaryLearning instead of the opposite Dec 11, 2020

Merge branch 'master' into dict-learning-func-call-class

98004b6

jeremiedbb changed the title ~~[MRG] Make dict_learning_online call MiniBatchDictionaryLearning instead of the opposite~~ [WIP] Make dict_learning_online call MiniBatchDictionaryLearning instead of the opposite Dec 16, 2020

jeremiedbb added 10 commits December 16, 2020 16:16

add max_iter; add n_batches_seen_

879e8b9

lint

a34999c

lint

c40c5eb

wip

31ff13d

Merge branch 'master' into dict-learning-func-call-class

5ec3e53

wip

f5ac665

several fixes to dict update in dict learning

ce065ba

fix docstrings

d78859b

avoid noise with 0 std

66553aa

add test for dict update

2885c21

jeremiedbb added 7 commits December 6, 2021 17:59

cln doc

f573af1

Merge remote-tracking branch 'upstream/main' into dict-learning-func-…

28b7ecf

…call-class

Merge branch 'master' into dict-learning-func-call-class

996ed7b

cln merge

8e41bbd

Merge branch 'master' into dict-learning-func-call-class

54d39dd

fix preserve dtype float32

fc99f6d

don't change ordering of rng consuming

5b70b0d

jeremiedbb added this to the 1.1 milestone Feb 10, 2022

jeremiedbb and others added 3 commits February 11, 2022 17:12

Merge branch 'master' into dict-learning-func-call-class

bd82169

set batch_size

35a58f7

Tweak denoising example

5945350

Back to 50 components to get faster running example

9f23d07

jeremiedbb added 2 commits February 16, 2022 09:48

Merge branch 'master' into dict-learning-func-call-class

fadc52b

black

837b9d4

glemaitre reviewed Mar 3, 2022

View reviewed changes

glemaitre reviewed Mar 4, 2022

View reviewed changes

doc/whats_new/v1.1.rst Show resolved Hide resolved

jeremiedbb added 2 commits March 4, 2022 16:53

Merge branch 'master' into dict-learning-func-call-class

bbdce1d

review comments

9979d11

glemaitre approved these changes Mar 9, 2022

View reviewed changes

jjerphan self-requested a review March 11, 2022 10:53

Merge branch 'main' into dict-learning-func-call-class

4dcf09b

ogrisel approved these changes Mar 30, 2022

View reviewed changes

ogrisel merged commit a23c2ed into scikit-learn:main Mar 30, 2022

jeremiedbb mentioned this pull request Mar 30, 2022

[WIP] REFACTOR: dict_learning and dict_learning_online #9036

Closed

Uh oh!

[MRG] Refactor MiniBatchDictionaryLearning and add stopping criterion #18975

[MRG] Refactor MiniBatchDictionaryLearning and add stopping criterion #18975

Uh oh!

Conversation

jeremiedbb commented Dec 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Dec 9, 2020

Uh oh!

jeremiedbb commented Dec 9, 2020

Uh oh!

jeremiedbb Dec 10, 2020

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Feb 15, 2022

Uh oh!

ogrisel commented Feb 15, 2022

Uh oh!

jeremiedbb commented Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Feb 15, 2022

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb commented Dec 7, 2020 •

edited

Loading

jeremiedbb commented Feb 15, 2022 •

edited

Loading