ENH Gaussian mixture bypassing unnecessary initialization computing #26021

jiawei-zhang-a · 2023-03-29T21:43:12Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

I add a private variable _init_weights_means_precisions_skipped in _base.py.
if a user is passing some initial values for the weights, means, and precision then there is no need to run the initialization (via K-means or random) to estimate the gaussian parameters.

These two steps are now skipped if _init_weights_means_precisions_skipped is True

Any other comments?

…ision are given

thomasjpfan

Thank you for the PR @jiawei-zhang-a !

Please add an entry to the change log at doc/whats_new/v1.3.rst with tag |Efficiency|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/mixture/_base.py

jiawei-zhang-a · 2023-04-14T01:18:55Z

Thank you Mr. Fan @thomasjpfan . I have removed the new state and a new changelog :)

thomasjpfan

Thank you for the PR @jiawei-zhang-a ! We still need a test to make sure that the parameters are not estimated during initialization. I think a simple way is to monkeypatching:

def test_gaussian_mixture_all_init_does_not_estimate_gaussian_parameters(monkeypatch):
    """When all init are provided, the Gaussian parameters are not estimated.

    Non-regression test for gh26015.
    """

    mock = Mock(side_effect=_estimate_gaussian_parameters)
    monkeypatch.setattr(
        sklearn.mixture._gaussian_mixture, "_estimate_gaussian_parameters", mock
    )

    rng = np.random.RandomState(0)
    rand_data = RandomData(rng)

    gm = GaussianMixture(
        n_components=rand_data.n_components,
        weights_init=rand_data.weights,
        means_init=rand_data.means,
        precisions_init=rand_data.precisions["full"],
        random_state=rng,
    )
    gm.fit(rand_data.X["full"])
    # The initial gaussian parameters are not estimated. They are estimated for every
    # m_step.
    assert mock.call_count == gm.n_iter_

Mock is from Python's untitest.mock module.

On main, the test would fail where mock.call_count is gm.n_iter_+1 from the extra call during initialization.

thomasjpfan · 2023-04-27T23:33:44Z

doc/whats_new/v1.3.rst

@@ -43,6 +43,7 @@ random sampling procedures.
  :user:`Jérémie du Boisberranger <jeremiedbb>`,
  :user:`Guillaume Lemaitre <glemaitre>`.

+


For git blame purpose, can you revert this?

sklearn/mixture/_gaussian_mixture.py

jiawei-zhang-a · 2023-04-28T17:51:46Z

Dear Mr.Fan, Thank you so much for all the advice! I will do it following your words!

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

thomasjpfan

A few minor comments, otherwise LGTM

thomasjpfan · 2023-04-28T18:53:50Z

sklearn/mixture/tests/test_gaussian_mixture.py

@@ -34,7 +34,8 @@
 from sklearn.utils._testing import assert_array_almost_equal
 from sklearn.utils._testing import assert_array_equal
 from sklearn.utils._testing import ignore_warnings
-
+from unittest.mock import Mock


Nit: Can you move this import to line 9 below import warning? This way the "first party Python modules" are at the top of the file.

Sure! I will do that immediately

sklearn/mixture/tests/test_gaussian_mixture.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

…cikit-learn into GaussianMixture

haiatn · 2023-07-29T09:26:24Z

Good job! Waiting for this to merge

OmarManzoor

Thanks for the PR @jiawei-zhang-a. Could you kindly resolve the conflicts by merging main and have a look at these few comments?

doc/whats_new/v1.3.rst

sklearn/mixture/_gaussian_mixture.py

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

github-actions · 2023-08-09T13:59:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 9ce05ce. Link to the linter CI: here}

jiawei-zhang-a · 2023-08-09T23:49:38Z

@OmarManzoor Thank you so much for your review! I have committed your suggestions and fix the conflict with main branch

OmarManzoor

Thanks for the updates. I added a few more comments otherwise this looks good now!

doc/whats_new/v1.3.rst

sklearn/mixture/tests/test_gaussian_mixture.py

jiawei-zhang-a · 2023-08-10T13:04:07Z

Thank you so much! I

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

sklearn/mixture/tests/test_gaussian_mixture.py

jiawei-zhang-a · 2023-08-10T13:13:23Z

Sure! I will check that now

…cikit-learn#26021)

jiawei-zhang-a added 2 commits March 29, 2023 17:26

Bypassing initialization when initial values weights, means, and prec…

bd09eb8

…ision are given

Bypassing initialization when initial values weights, means, and prec…

cbe04d6

…ision are given

github-actions bot added the module:mixture label Mar 29, 2023

jiawei-zhang-a added 4 commits March 31, 2023 01:38

change the variable name to meet the standard

c978dbf

Merge branch 'main' into GaussianMixture

671eb35

Merged

f86fbba

make the parameter private for base and GaussianMixture

2c99237

thomasjpfan reviewed Apr 13, 2023

View reviewed changes

sklearn/mixture/_base.py Outdated Show resolved Hide resolved

jiawei-zhang-a and others added 2 commits April 13, 2023 20:10

Merge branch 'scikit-learn:main' into GaussianMixture

1aae098

no var

5cb80f6

thomasjpfan reviewed Apr 28, 2023

View reviewed changes

jiawei-zhang-a and others added 2 commits April 28, 2023 14:04

Update sklearn/mixture/_gaussian_mixture.py

b480670

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Add test for bypassing

b95b2f7

thomasjpfan approved these changes Apr 28, 2023

View reviewed changes

thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Apr 28, 2023

jiawei-zhang-a and others added 3 commits April 28, 2023 14:55

Update sklearn/mixture/tests/test_gaussian_mixture.py

83b7a7f

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

small adjustment

403031c

Merge branch 'GaussianMixture' of https://github.com/jiawei-zhang-a/s…

efd7176

…cikit-learn into GaussianMixture

glemaitre self-requested a review June 29, 2023 12:43

OmarManzoor reviewed Aug 9, 2023

View reviewed changes

doc/whats_new/v1.3.rst Outdated Show resolved Hide resolved

sklearn/mixture/_gaussian_mixture.py Outdated Show resolved Hide resolved

OmarManzoor removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Aug 9, 2023

jiawei-zhang-a and others added 2 commits August 9, 2023 09:57

Update doc/whats_new/v1.3.rst

bef22c6

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Update sklearn/mixture/_gaussian_mixture.py

a080d40

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Fix the conflict

f54f541

OmarManzoor approved these changes Aug 10, 2023

View reviewed changes

jiawei-zhang-a and others added 4 commits August 10, 2023 09:04

Update doc/whats_new/v1.3.rst

67a2640

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Update sklearn/mixture/tests/test_gaussian_mixture.py

3e3d828

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Update sklearn/mixture/tests/test_gaussian_mixture.py

981301c

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Update sklearn/mixture/tests/test_gaussian_mixture.py

9f38ea7

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

OmarManzoor reviewed Aug 10, 2023

View reviewed changes

sklearn/mixture/tests/test_gaussian_mixture.py Outdated Show resolved Hide resolved

change line for > 80 character

9ce05ce

OmarManzoor merged commit 1a78993 into scikit-learn:main Aug 10, 2023

TamaraAtanasoska pushed a commit to TamaraAtanasoska/scikit-learn that referenced this pull request Aug 21, 2023

ENH Gaussian mixture bypassing unnecessary initialization computing (s…

18f8706

…cikit-learn#26021)

glemaitre removed their request for review September 18, 2023 10:47

glemaitre mentioned this pull request Sep 18, 2023

DOC move some fixes from 1.4 to 1.3.1 #27402

Merged

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

ENH Gaussian mixture bypassing unnecessary initialization computing (s…

500d162

…cikit-learn#26021)

		@@ -43,6 +43,7 @@ random sampling procedures.
		:user:`Jérémie du Boisberranger <jeremiedbb>`,
		:user:`Guillaume Lemaitre <glemaitre>`.

Uh oh!

ENH Gaussian mixture bypassing unnecessary initialization computing #26021

ENH Gaussian mixture bypassing unnecessary initialization computing #26021

Uh oh!

Conversation

jiawei-zhang-a commented Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiawei-zhang-a commented Apr 14, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Apr 27, 2023

Choose a reason for hiding this comment

Uh oh!

jiawei-zhang-a Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiawei-zhang-a commented Apr 28, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

jiawei-zhang-a Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

haiatn commented Jul 29, 2023

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jiawei-zhang-a commented Aug 9, 2023

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiawei-zhang-a commented Aug 10, 2023

Uh oh!

Uh oh!

jiawei-zhang-a commented Aug 10, 2023

Uh oh!

Uh oh!

jiawei-zhang-a commented Mar 29, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023 •

edited

Loading