FEA Add array API support for GaussianMixture #30777

lesteve · 2025-02-06T14:25:53Z

Working on it with @StefanieSenger.

Link to TODO

github-actions · 2025-02-06T14:27:12Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: d46840b. Link to the linter CI: here}

…pr/lesteve/30777

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

OmarManzoor

Overall looks good. I left a few comments

sklearn/mixture/_base.py

sklearn/utils/_array_api.py

sklearn/utils/tests/test_array_api.py

OmarManzoor · 2025-06-14T10:02:44Z

sklearn/mixture/tests/test_gaussian_mixture.py

+
+
+# TODO What is the expected behavior when weights init
+# and X are not in the same namespace/device?


I think this is not resolved yet. Can we remove the commented out code?

sklearn/mixture/tests/test_gaussian_mixture.py

… are passed in + fixes

OmarManzoor · 2025-06-19T07:28:40Z

@lesteve Just one test failing and that has to do with array api strict on device and float32. Maybe we need to increase the tolerance further for this specific scenario.

lesteve · 2025-06-19T09:32:40Z

My honest impression is that these tests are fragile on float32 data but I don't really know if there is much we can do to improve the situation ...

Even for array-api-strict the results are different because of the difference between scipy.linalg.choleksy and numpy.linalg.cholesky and between scipy.linalg.triangular_solve and numpy.linalg.solve.

On a GPU VM I also saw some test failures (a few more than in the CI actually) and raised the atol and rtol a bit to get them to pass locally. I trigger another run of the CUDA CI, let's see what happens 🤞.

OmarManzoor · 2025-06-19T09:41:48Z

I don't think we can do much with trying to improve array-api-strict tests for float32 especially with respect to accuracy. As long as array-api-strict works generally I think that should be sufficient.

OmarManzoor

LGTM. Thank you for the work done in this PR @lesteve and @StefanieSenger

lesteve · 2025-06-20T12:50:59Z

Thanks for the reviews @OmarManzoor and @ogrisel!

One of the remaining question in the old and long TODO list: should we implement __sklearn_tags__ to tell that GaussianMixture has array_api_support?

PCA ___sklearn_tags__ does this currently and always sets array_api = True although array API support is implemented for some values of the parameters, not sure whether this is expected or not:

scikit-learn/sklearn/decomposition/_pca.py

Lines 848 to 857 in 8792943

    
           def __sklearn_tags__(self): 
        
               tags = super().__sklearn_tags__() 
        
               tags.transformer_tags.preserves_dtype = ["float64", "float32"] 
        
               tags.array_api_support = True 
        
               tags.input_tags.sparse = self.svd_solver in ( 
        
                   "auto", 
        
                   "arpack", 
        
                   "covariance_eigh", 
        
               ) 
        
               return tags

I am guessing the array_api tags is only used for the common tests right now, right?

ogrisel · 2025-06-20T15:34:19Z

Good questions:

indeed, we could make PCA only return the tags.array_api_support = True when the solver supports array API inputs.
similarly for GaussianMixture (depending on the choice of the init).

lesteve added 4 commits February 5, 2025 11:17

wip

b04a9f7

wip

e6ba4e4

stuck on linalg.cholesky array API support

2226a55

a bit further with xp.cholesky but now linalg.solve_triangular

b1fdee7

github-actions bot added the module:mixture label Feb 6, 2025

lesteve marked this pull request as draft February 6, 2025 14:26

StefanieSenger self-requested a review February 14, 2025 09:28

StefanieSenger and others added 11 commits February 14, 2025 11:54

more array api

14fb0ba

wip (problem with weights as numpy arrays)

6010ff7

array api for covariance_type='diag' and init_params='random'

aa2a383

add simple test

de4f3a5

Add comments about tricky bits

7974931

lint

08e5f9b

one more comment

0f525ef

revert unwanted change

4801e2b

fix test_bayesian_mixture

de1343c

Compare to numpy result in test

b05eca0

Use global_random_seed

c35bdd6

lesteve added the CUDA CI label Mar 12, 2025

github-actions bot removed the CUDA CI label Mar 12, 2025

StefanieSenger and others added 5 commits March 12, 2025 14:30

retrigger CI

4516920

Merge branch 'gmm-array-api' of github.com:lesteve/scikit-learn into …

61c8b5d

…pr/lesteve/30777

retrigger CI

e974051

retrigger CI [azure parallel]

1a7f262

A bit further with setting the device more correctly

fb40870

lesteve mentioned this pull request Mar 13, 2025

BUG: error for arrays on non-default device scipy/scipy#22680

Open

lesteve added 3 commits March 14, 2025 16:52

Add our own implementation of logsumexp [azure parallel]

f2eba56

Fix implementation of logsumexp

a0f8d25

Fix for older numpy versions

53e9917

lesteve and others added 3 commits June 13, 2025 16:47

Add all array constructor params to test

1a0e33b

[azure parallel] tweak docstring

1dca29a

Update sklearn/utils/_array_api.py

b990682

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

OmarManzoor reviewed Jun 14, 2025

View reviewed changes

lesteve added 8 commits June 16, 2025 14:48

Remove commented out test

72cd185

Handle comments

3af1470

use _call_cholesky

ecac610

More explicit use of scipy.linalg

341b659

[azure parallel] Increase rtol for float32 tests + some minor cleanups

7ffc5c7

rename variables

3b95a5f

[azure parallel] test more precisely when array constructor arguments…

45ba1ee

… are passed in + fixes

[azure parallel] Remove debug

4f89101

OmarManzoor added the CUDA CI label Jun 19, 2025

github-actions bot removed the CUDA CI label Jun 19, 2025

lesteve added 2 commits June 19, 2025 11:14

Test more attributes

d2ca209

Increase tol to make tests pass

d46840b

lesteve added the CUDA CI label Jun 19, 2025

github-actions bot removed the CUDA CI label Jun 19, 2025

OmarManzoor approved these changes Jun 19, 2025

View reviewed changes

OmarManzoor merged commit cc526ee into scikit-learn:main Jun 19, 2025
40 checks passed

github-project-automation bot moved this from In Progress to Done in Array API Jun 19, 2025

lesteve deleted the gmm-array-api branch June 19, 2025 12:18

lesteve mentioned this pull request Jun 20, 2025

MNT Use _add_to_diagonal in GaussianMixture #31607

Merged

jeremiedbb mentioned this pull request Jul 15, 2025

Release 1.7.1 #31762

Merged

13 tasks

lesteve mentioned this pull request Jul 18, 2025

MNT Add tags to GaussianMixture array API and precise them for PCA #31784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FEA Add array API support for GaussianMixture #30777

FEA Add array API support for GaussianMixture #30777

Uh oh!

lesteve commented Feb 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2025 •

edited

Loading

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor Jun 14, 2025

Uh oh!

Uh oh!

Uh oh!

OmarManzoor commented Jun 19, 2025

Uh oh!

lesteve commented Jun 19, 2025 •

edited

Loading

Uh oh!

OmarManzoor commented Jun 19, 2025

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

lesteve commented Jun 20, 2025

Uh oh!

ogrisel commented Jun 20, 2025

Uh oh!

Uh oh!



		# TODO What is the expected behavior when weights init
		# and X are not in the same namespace/device?

Uh oh!

FEA Add array API support for GaussianMixture #30777

FEA Add array API support for GaussianMixture #30777

Uh oh!

Conversation

lesteve commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

OmarManzoor commented Jun 19, 2025

Uh oh!

lesteve commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor commented Jun 19, 2025

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lesteve commented Jun 20, 2025

Uh oh!

ogrisel commented Jun 20, 2025

Uh oh!

Uh oh!

lesteve commented Feb 6, 2025 •

edited

Loading

github-actions bot commented Feb 6, 2025 •

edited

Loading

lesteve commented Jun 19, 2025 •

edited

Loading