Skip to content

TST use global_random_seed in sklearn/decomposition/tests/test_incremental_pca.py #31250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 21, 2025

Conversation

DeaMariaLeon
Copy link
Contributor

Reference Issues/PRs

Towards #22827

What does this implement/fix? Explain your changes.

I changed the batch size on test_singular_values because it was failing.

Any other comments?

cc @glemaitre

Copy link

github-actions bot commented Apr 25, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ec01f9f. Link to the linter CI: here

test_incremental_pca_check_projection
test_incremental_pca_inverse
test_incremental_pca_batch_signs
test_incremental_pca_batch_values
test_incremental_pca_partial_fit
test_incremental_pca_against_pca_random_data
test_singular_values
Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR, @DeaMariaLeon!

This looks similar to #31213, so I don't think we need a changelog here.

@virchan virchan added No Changelog Needed Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels May 13, 2025
Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's merge this one!

@@ -360,7 +360,7 @@ def test_singular_values():
)

pca = PCA(n_components=10, svd_solver="full", random_state=rng).fit(X)
ipca = IncrementalPCA(n_components=10, batch_size=100).fit(X)
ipca = IncrementalPCA(n_components=10, batch_size=150).fit(X)
Copy link
Member

@lesteve lesteve May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I was also getting one failure locally without increasing batch_size, here it is for completeness/posterity:

___________________________________________________________________________________________________________________________________________________ test_singular_values[47] ___________________________________________________________________________________________________________________________________________________

global_random_seed = 47

    def test_singular_values(global_random_seed):
        # Check that the IncrementalPCA output has the correct singular values
    
        rng = np.random.RandomState(global_random_seed)
        n_samples = 1000
        n_features = 100
    
        X = datasets.make_low_rank_matrix(
            n_samples, n_features, tail_strength=0.0, effective_rank=10, random_state=rng
        )
    
        pca = PCA(n_components=10, svd_solver="full", random_state=rng).fit(X)
        ipca = IncrementalPCA(n_components=10, batch_size=100).fit(X)
>       assert_array_almost_equal(pca.singular_values_, ipca.singular_values_, 2)

sklearn/decomposition/tests/test_incremental_pca.py:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (array([0.99955105, 0.99004802, 0.95818804, 0.91391011, 0.85194042,
       0.7787519 , 0.69739244, 0.61203967, 0.52688...0.99001237, 0.95811893, 0.91387836, 0.85180776,
       0.77866619, 0.69725655, 0.61192212, 0.52547507, 0.37600852]), 2), kwargs = {}, old_name = 'y', new_name = 'desired'

    @functools.wraps(fun)
    def wrapper(*args, **kwargs):
        for old_name, new_name in zip(old_names, new_names):
            if old_name in kwargs:
                if dep_version:
                    end_version = dep_version.split('.')
                    end_version[1] = str(int(end_version[1]) + 2)
                    end_version = '.'.join(end_version)
                    msg = (f"Use of keyword argument `{old_name}` is "
                           f"deprecated and replaced by `{new_name}`. "
                           f"Support for `{old_name}` will be removed "
                           f"in NumPy {end_version}.")
                    warnings.warn(msg, DeprecationWarning, stacklevel=2)
                if new_name in kwargs:
                    msg = (f"{fun.__name__}() got multiple values for "
                           f"argument now known as `{new_name}`")
                    raise TypeError(msg)
                kwargs[new_name] = kwargs.pop(old_name)
>       return fun(*args, **kwargs)
E       AssertionError: 
E       Arrays are not almost equal to 2 decimals
E       
E       Mismatched elements: 1 / 10 (10%)
E       Max absolute difference among violations: 0.06882675
E       Max relative difference among violations: 0.18304573
E        ACTUAL: array([1.  , 0.99, 0.96, 0.91, 0.85, 0.78, 0.7 , 0.61, 0.53, 0.44])
E        DESIRED: array([1.  , 0.99, 0.96, 0.91, 0.85, 0.78, 0.7 , 0.61, 0.53, 0.38])

../../micromamba/envs/scikit-learn-dev/lib/python3.13/site-packages/numpy/_utils/__init__.py:85: AssertionError
=================================================================================================================================================== short test summary info ====================================================================================================================================================
FAILED sklearn/decomposition/tests/test_incremental_pca.py::test_singular_values[47] - AssertionError: 
================================================================================================================================================ 1 failed, 818 passed in 6.08s =================================================================================================================================================

@lesteve lesteve merged commit 44b1f4d into scikit-learn:main May 21, 2025
45 of 46 checks passed
@DeaMariaLeon DeaMariaLeon deleted the inc-pca branch May 21, 2025 16:25
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request May 30, 2025
jeremiedbb pushed a commit that referenced this pull request Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:decomposition No Changelog Needed Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants