Skip to content

ENH support float32 in SpectralEmbedding for LOBPCG and PyAMG solvers #21534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Nov 5, 2021
Merged

Conversation

lobpcg
Copy link
Contributor

@lobpcg lobpcg commented Nov 2, 2021

Remove check_array call for float64, since both lobpcg function and pyamg package support float32 for a few years now; see
#21321

Reference Issues/PRs

Fixes #21321.

What does this implement/fix? Explain your changes.

Remove check_array call for float64, since both lobpcg function and pyamg package support float32 for a few years now

Remove check_array call for float64, since both lobpcg function and pyamg package support float32 for a few years now; see
#21321
@lobpcg lobpcg marked this pull request as draft November 2, 2021 15:10
lobpcg and others added 14 commits November 2, 2021 12:38
test_spectral_embedding_two_components for dtype in [np.float32, np.float64] and @pytest.mark.parametrize("eigen_solver", ("arpack", "lobpcg", "amg"))
added an entry
PR 21534 entered
… @pytest.mark.parametrize("dtype", ("np.float32", "np.float64")) added to test_spectral_embedding_precomputed_affinity
@lobpcg lobpcg marked this pull request as ready for review November 3, 2021 02:12
@glemaitre glemaitre changed the title Update _spectral_embedding.py ENH support np.float32 in SpectralEmbedding for LOBPCG and PyAMG solver Nov 3, 2021
@glemaitre glemaitre changed the title ENH support np.float32 in SpectralEmbedding for LOBPCG and PyAMG solver ENH support np.float32 in SpectralEmbedding for LOBPCG and PyAMG solvers Nov 3, 2021
@glemaitre glemaitre changed the title ENH support np.float32 in SpectralEmbedding for LOBPCG and PyAMG solvers ENH support float32 in SpectralEmbedding for LOBPCG and PyAMG solvers Nov 3, 2021
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still needed to check the input values but we can change the checks to accept float32 data:

laplacian = check_array(laplacian, dtype=[np.float64, np.float32], accept_sparse=True)
trying everywhere:
@pytest.mark.parametrize("dtype", (float32, float64))
@pytest.mark.parametrize("dtype", ("float32", "float64"))
formatting
        laplacian = check_array(
            laplacian,
            dtype=[np.float64, np.float32],
            accept_sparse=True
        )
@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 3, 2021

black fails here, but passes on my local system. I have not seen this before.

There is no way to fix it for me, since black here does not show the changes suggested.

@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 4, 2021

Regarding the assertion of the output and fitted attributes data type, we could indeed create a specific function test_spectral_embeddding_preserving_dype parameterized with the float64 and float32 types that only check this particular behaviour. It will not clutter the semantic of the other tests.

@glemaitre could you please propose a specific commit?

@pytest.mark.skipif(
    is_pyamg_not_available, reason="PyAMG is not installed and thus we cannot test."
)
replaces pytest.importorskip("pyamg")
@glemaitre
Copy link
Member

This would be the test that I have in mind:

@pytest.mark.parametrize(
    "eigen_solver",
    [
        "arpack",
        "lobpcg",
        pytest.param("amg", marks=skip_if_no_pyamg),
    ],
)
@pytest.mark.parametrize("dtype", [np.float32, np.float64])
def test_spectral_embedding_preserves_dtype(eigen_solver, dtype):
    """Check that `SpectralEmbedding is preserving the dtype of the fitted
    attribute and transformed data.

    Ideally, this test should be covered by the common test
    `check_transformer_preserve_dtypes`. However, this test only run
    with transformers implementing `transform` while `SpectralEmbedding`
    implements only `fit_transform`.
    """
    X = S.astype(dtype)
    se = SpectralEmbedding(
        n_components=2, affinity="rbf", eigen_solver=eigen_solver, random_state=0
    )
    X_trans = se.fit_transform(X)

    assert X_trans.dtype == dtype
    assert se.embedding_.dtype == dtype
    assert se.affinity_matrix_.dtype == dtype

@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 4, 2021

@glemaitre I'll try adding your new tests just above and #21534 (comment) after getting all the tests pass again with the new changes

lobpcg and others added 3 commits November 4, 2021 13:56
int64 for labels

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 4, 2021

@glemaitre both your new tests pass. Does one of them cover already your suggestion #21534 (comment) ? This is the only one remains open as far as I can see?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nitpick but otherwise LGTM! Thank you very much for your perseverance :)

@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 5, 2021

@ogrisel thanks for reviewing promptly! "Perseverance" is my nick name. Life is too short for our coding efforts to be wasted :-)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit 56c08ac into scikit-learn:main Nov 5, 2021
@glemaitre
Copy link
Member

Thanks @lobpcg all good for me.

@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 5, 2021

@glemaitre @ogrisel - thanks for your help!

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was writing this review. 😄

Comment on lines 370 to +372
X = random_state.rand(laplacian.shape[0], n_components + 1)
X[:, 0] = dd.ravel()
X = X.astype(laplacian.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish it were possible to specify dtype when using numpy PRNG!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same. But we need to discuss the SLEP about the random state

@lobpcg
Copy link
Contributor Author

lobpcg commented Nov 5, 2021

@jjerphan - Thanks for taking over!

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adding full float32 support in spectral clustering and manifold embedding
4 participants