Skip to content

Implement SplineTransformer.inverse_transform #28551

@ogrisel

Description

@ogrisel

Describe the workflow you want to enable

I think it should be possible to implement a new method inverse_transform such that:

import numpy as np
from sklearn.preprocessing import SplineTransformer

rng = np.random.default_rng(0)
X_train = rng.normal(size=(42, 5))
X_test = rng.normal(size=(43, 5))

st = SplineTransformer().fit(X_train)
np.testing.assert_allclose(X_test, st.inverse_transform(st.transform(X_test)))

Describe your proposed solution

There might be several mathematical ways to define such a transform, in particular if when passing a X_fake_transformed that contain real numbers that do not actually result from a spline expansion. For instance when:

  • (X_fake_transformed < 0).any()
  • (X_fake_transformed > 1).any()
  • X_fake_transformed.sum(axis=1) != np.ones(n_samples).

or when all values of a given row are non-zeros at once...

One possible way would be to decode based on X_fake_transformed.argmax(axis=1) and then using the relative strength of neighboring spline activations to resolve ambiguities.

Describe alternatives you've considered, if relevant

The main alternative is to not implement this. The main question is probably why try to implement this in the first place?

Possible use cases:

  • fit a GMM model on spline transformed data (to get a more axis-aligned inductive prior), generate samples in the GMM latent space and then recode those samples back into the original space.

  • fit PCA with a small rank on spline encoded data and then reconstruct back the projected data,

  • fit k-means in spline space and recode the learned centroids back in the original feature space for inspection.

  • idem for NMF or dictionary learning components.

Additional context

If #28043 gets merged, missing values support should also be included when using the 'indicator' strategy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions