ENH Add periodic extrapolation to SplineTransformer #19483

mlondschien · 2021-02-18T09:18:27Z

Reference Issues/PRs

Extends #18368.

What does this implement/fix? Explain your changes.

This PR adds the option for periodic extrapolation (with periodic splines) to the SplineTransformer.

Any other comments?

Thank you @lorentzenchr for adding the SplineTransformer! I often use periodic splines e.g. to model effects for the day of the year and would believe this feature to be very helpful. Please let me know if we should expand documenting code / examples.

ogrisel · 2021-02-19T08:59:44Z

If supporting old versions of scipy is too much of a hassle I think it's fine to raise NotImplementedError with a message that explains to upgrade scipy to use this new feature.

ogrisel · 2021-02-19T09:01:53Z

Alternatively, if missing code in scipy is a rather self-contained pure-python function, feel free to backport it in sklearn/utils/fixes.py with a comment to credit provenance.

mlondschien · 2021-02-19T09:23:59Z

If supporting old versions of scipy is too much of a hassle I think it's fine to raise NotImplementedError with a message that explains to upgrade scipy to use this new feature.

Alternatively, if missing code in scipy is a rather self-contained pure-python function, feel free to backport it in sklearn/utils/fixes.py with a comment to credit provenance.

The only code needed to support scipy<1.0.0 is (basically a copy from the code in scipy):

n = spl.t.size - spl.k - 1
X[:, i] = spl.t[spl.k] + (X[:, i] - spl.t[spl.k]) % (spl.t[n] - spl.t[spl.k])

IMO this is fine, but raising a NotImplementedError works as well. Adding a backport to sklean/utils/fixes requires more code than above and might be too much added complexity for my taste.

ogrisel · 2021-02-19T09:47:52Z

Alright, I am fine with keeping this at it is then.

ogrisel · 2021-02-19T09:53:41Z

Could you please extend the spline example with a new section on modeling periodic data (with synthetic data with a mix of yearly and weekly signals)? You can use pandas to generate such fake data.

https://scikit-learn.org/dev/auto_examples/linear_model/plot_polynomial_interpolation.html#sphx-glr-auto-examples-linear-model-plot-polynomial-interpolation-py

Note: as I have no experience with using splines myself, that would help me review the PR by seeing how to use this new feature correctly and assessing its correctness.

sklearn/preprocessing/_polynomial.py

sklearn/preprocessing/tests/test_polynomial.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…n into splines-periodic

mlondschien · 2021-02-21T19:00:27Z

I've added the test and an example. I used a synthetic example as I believed that this better visualized how periodic splines work. I am open for other suggestions though. How do I check whether the example renders correctly?

mlondschien · 2021-02-25T14:44:54Z

Since you implemented the transformer, it would be very helpful if you could have a look @lorentzenchr.

lorentzenchr · 2021-02-25T16:21:08Z

Since you implemented the transformer, it would be very helpful if you could have a look

@mlondschien I will. Thanks for working on this extension.

lorentzenchr

On the test side, I miss a test that the extrapolation works. Let's say period length is 1. Fitting is on x in [0, 1] then transform([0, 1]) == transform([1, 2]) and in particular at the knots.

Maybe, one should more clearly state, that the periodicity is given either by the manually specified knots or by the range of the training data. It is neither automatically detected, nor can you specify the periodicity directly (with a parameter).

examples/linear_model/plot_polynomial_interpolation.py

sklearn/preprocessing/_polynomial.py

sklearn/preprocessing/tests/test_polynomial.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

mlondschien · 2021-02-26T19:08:32Z

Thanks @lorentzenchr for having a look.

On the test side, I miss a test that the extrapolation works. Let's say period length is 1. Fitting is on x in [0, 1] then transform([0, 1]) == transform([1, 2]) and in particular at the knots.

This is done in this test, even if not as explicitly as you are suggesting. However this (and the test you suggested) would pass for any periodic continuation of splines (e.g. normal B-Splines with extrapolate="periodic". I can add a test that asserts that transformers with knots at [1, 2, 3, 4, 5] and [2, 3, 4, 5, 6] yield the same result up to permutation of coordinates.

Maybe, one should more clearly state, that the periodicity is given either by the manually specified knots or by the range of the training data. It is neither automatically detected, nor can you specify the periodicity directly (with a parameter).

I have tried to make this more explicit in the example. Where / how would you specify this in the docstring? One could add

If 'periodic', periodic splines with a periodicity equal to the distance of the first and last knot are used. Periodic splines enforce equal function values and derivatives at the first and last knots.

to the extrapolation parameter. Maybe this could be made more concise.

Note that this statement is no longer correct. Would you like to point out the different number of splines for periodic splines or is this a minor detail?

lorentzenchr · 2021-03-01T22:02:10Z

Periodic b-splines turn out to be surprisingly complicated. But we're close to the finish.

mlondschien · 2021-03-02T09:59:57Z

I've expanded the example to suggest users to provide knots explicitly if the periodicity is known. I've also tried to make the smoothness test more self-explanatory.

Yes to spare compute resources.

This makes sense. I would have expected the tests to never finish (i.e. yellow dot) instead of failing though.

Actually it might be worth adding a test that checks the periodicity extrapolation of the spline transformer (without sticking it into a pipeline with a regressor) with 2 input features e.g. feature 0 with values in [-1, 1] and features 1 with values in [0, 5] in the training set.
And then checking that we get the transformed features on a test set with values on [1, 2] for features 0 and [5, 10] for feature 1.

Sounds reasonable. I've added such a test.

Periodic b-splines turn out to be surprisingly complicated. But we're close to the finish.

Thanks for investing the time to review.

ogrisel · 2021-03-02T10:35:56Z

I pushed ca92728 to improve test_spline_transformer_periodic_splines_smoothness to test continuity on the raw diff, prior to dividing by delta (to get the derivative). Also I changed the assertion by comparing the scaler abs max to tol to get more informative values in the pytest assertion error message in case of failure.

mlondschien · 2021-03-02T10:36:33Z

sklearn/preprocessing/tests/test_polynomial.py

        # Check continuity of the (d-1)-th derivative
-        assert (np.abs(dXt) < tol).all()
+        diff = np.diff(dXt, axis=0)
+        assert np.abs(diff).max() < tol


Note that this (and below) depends on tol / delta ~ 10. I.e. if we reduce ~~(or increase)~~ the number of rows in X substantially, this test will fail.

I don't understand why though. Continuity should always be checked on the raw diff, assuming that the steps are fine grained enough, no?

Otherwise we "check continuity" by bounding the absolute value of the derivative itself which seems wrong to me.

Given continuity, the "raw diff" scales with the order of the stepsize (delta). It will be roughly of the same order. Above, your stepsize is ~1e-3, so checking against tol=1e-2 is fine. If we reduce the stepsize to 0.1, the diff will be of order diff ~ 0.1 > tol = 1e-2.

mlondschien · 2021-03-07T07:12:04Z

@lorentzenchr, if you could have another (final 😉?) look, that would be great.

lorentzenchr · 2021-03-09T20:41:03Z

@mlondschien This PR looks very good and, if necessary, we can improve some minor things in later PRs. I have 2 remaining issues, a small one:

Add the periodic case to test_spline_transformer_unity_decomposition.

And a large one: test_spline_transformer_periodic_splines_smoothness
tol=1e-2 is not a very strict test. Also, we don't need to test continuity at all values of X. Just at the boundaries b of the period. Something like

eps = 1e-8
for b in [0, 8, 16]:
    Xnew = np.linspace(b - degree*eps , b + degree*eps, num=2*degree + 1)[:, None]
    Xt = transformer.fit_transform(Xnew).squeeze()
    assert Xt[degree-1]  = pytest.approx(Xt[degree+1])  # compare at b-eps and b+eps
    # TODO: compare left n-th derivative at b-eps with right n-th derivative at b+eps

What do you think?

mlondschien · 2021-03-10T09:26:45Z

Add the periodic case to test_spline_transformer_unity_decomposition.

Sure!

tol=1e-2 is not a very strict test. Also, we don't need to test continuity at all values of X. Just at the boundaries b of the period. Something like

eps = 1e-8
for b in [0, 8, 16]:
    Xnew = np.linspace(b - degree*eps , b + degree*eps, num=2*degree + 1)[:, None]
    Xt = transformer.fit_transform(Xnew).squeeze()
    assert Xt[degree-1]  = pytest.approx(Xt[degree+1])  # compare at b-eps and b+eps
    # TODO: compare left n-th derivative at b-eps with right n-th derivative at b+eps

What do you think?

IMO the test from before, I.e. with X.shape = (1000,1), was strict enough, as we checked that the diff was an order of magnitude above the tolerance for the last iteration where we expected discontinuity. This test makes sure that we did not choose a too large tolerance. However, if you prefer, I can also adjust the test to only check continuity at the boundary, choose a smaller stepsize there and thus a smaller tolerance. We could then also test continuity of the constant extrapolation and C1 of linear extrapolation in the same test via parametrization.

mlondschien · 2021-03-10T13:37:44Z

I have adjusted (but not pushed) the smoothness test as proposed by you @lorentzenchr. See here:

@pytest.mark.parametrize("degree", [3, 5])
def test_spline_transformer_periodic_splines_smoothness(degree):
    """Test that spline transformation is smooth at first / last knot."""
    X = np.linspace(0, 8, 10)[:, None]

    transformer = SplineTransformer(
        degree=degree,
        extrapolation="periodic",
        knots=[[0.0], [1.0], [3.0], [4.0], [5.0], [8.0]]
    )
    Xt = transformer.fit(X)

    eps = 1e-8

    for b in [0, 8, 16]:
        dXt = transformer.transform(
            np.linspace(
                b - degree * eps,
                b + degree * eps,
                2 * degree + 1
            )[:, None]
        )

        for d in range(degree):
            assert_allclose(dXt[0, :], dXt[-1, :], atol = 10 * eps, rtol=0)
            dXt = np.diff(dXt, axis=0) / eps

This fails due to problems with precision. The original dXt has precision of around 1e-16 and so has np.diff(dXt, axis=0). In the second iteration, due to dividing by eps, the dXt has precision of around 1e-8. As you can imagine, the test fails for degree >= 3.

Possible options:

Use lower precision here, such that eps ** degree >> 1e-16. For degree = 5, this implies eps > 1e-3.
Go back to the original test.

Let me know what you think @lorentzenchr. If you had something different in mind that does not have this problem, please let me know. I don't believe that we should hold off on merging this PR due to details about tolerances / precision. Feel free to add any changes you would like to the smoothness test onto this PR.

lorentzenchr · 2021-03-12T17:20:08Z

What about looking at b-eps and b+eps with eps=1e-12, but then use a different variable, i.e. h=1e-3, for the numerical drivatives. I'm really only interested in the 2 values at b-eps and b+eps. In addition, one can use higher order formulas for numerical derivatives, see https://en.wikipedia.org/wiki/Finite_difference_coefficient and https://github.com/lorentzenchr/scikit-learn/blob/4e007384e7eb4e86553a1bb44a220d9a7ce48134/sklearn/_loss/tests/test_loss.py#L89-L104.
Ideally, one would use backward differences at b-eps and forward differences at b+eps.

sklearn/preprocessing/_polynomial.py

mlondschien · 2021-03-16T16:55:22Z

In addition, one can use higher order formulas for numerical derivatives

I could not find an implementation for forward / backward diff weights in scipy. Are you fine with scipy.misc.central_diff_weights?

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr · 2021-03-16T21:45:06Z

I could not find an implementation for forward / backward diff weights in scipy. Are you fine with scipy.misc.central_diff_weights?

It's not my intention to make this PR more complex. If you could use https://en.wikipedia.org/wiki/Finite_difference_coefficient, that'd be great. Using scipy.misc.central_diff_weightsmight also do the job. So I'd suggest use that and make a comment such that a future PR could improve this with forward and backward differences.

mlondschien · 2021-03-17T11:01:08Z

It's not my intention to make this PR more complex.

To be frank, IMO its too late for this. I have probably spent as much time on the smoothness test as on the periodic splines themselves.

If you could use https://en.wikipedia.org/wiki/Finite_difference_coefficient, that'd be great. Using scipy.misc.central_diff_weights might also do the job. So I'd suggest use that and make a comment such that a future PR could improve this with forward and backward differences.

How would you go about using the coefficients from Wikipedia? Just copy them over?

I have tried using (central) finite differences as suggested by you. See the code in the details below. The test still fails for high degrees / small tolerances. IMO this is still due to precision. According to Wikipedia, the d-th derivative can be approximated as the sum of products of weights and function evaluation divided by the stepsize (eps) to the d-th power. This is also how scipy.misc.derivative is implemented. According to Wikipedia, if I use order (accuracy) o coefficients, the approximation will be correct up to a term of order stepsize to the o-th power. However, IIUC, this assumes infinite precision in the function evaluations themselves. In our use-case, the function evaluations have precision around 1e-16, so using a stepsize of eps, any approximation of the d-th derivative will have a precision of at best 1e-16 * eps ** d.

See the following example:

In [1]: from scipy.misc import derivative
   ...: import numpy as np
   ...: 
   ...: def f(x):
   ...:     return np.sin(x)
   ...: 

In [2]: derivative(f, 0, dx=1e-8, n=1, order=7)
Out[2]: 1.0000000000000002

In [3]: derivative(f, 0, dx=1e-8, n=2, order=7)
Out[3]: 7.237830359838991e-09

In [4]: derivative(f, 0, dx=1e-8, n=3, order=7)
Out[4]: 9.098986738083303

In [5]: derivative(f, 0, dx=1e-4, n=3, order=7)
Out[5]: -0.9999999011365072

In [6]: derivative(f, 0, dx=1e-4, n=5, order=7)
Out[6]: 5.421010862427521

In [7]: derivative(f, 0, dx=1e-2, n=5, order=7)
Out[7]: 0.999966567882815

Note how the third derivative of f using a stepsize of 1e-8 is complete garbage. Reducing the stepsize to 1e-4 produces a much more reasonable result.

I believe that hunting higher tolerances is useless. IMO the test as I had implemented it originally or as it is currently is sufficient for the use-case. I have added a small commit onto @ogrisel's suggestion to make the tolerance dependent of delta. I am not willing to invest a significant amount of time into the test. If you are unhappy with the tolerance, please add a commit on top to fix this. One option would be to only test until degree=2 with stepsize eps=1e-8 or until degree = 4 with stepsize eps=1e-4. I believe that asserting the extrapolation is smooth for higher degrees is more important than a strict tolerance.

I sincerely hope that this test does not prevent us from merging the PR. Periodic splines are a feature that improves on the status quo. The current tests are much more extensive than those before, e.g. as there are no tests for continuity / differentiability at the border for constant / linear extrapolation.

@pytest.mark.parametrize("degree", [3, 5])
def test_spline_transformer_periodic_splines_smoothness(degree):
    """Test that spline transformation is smooth at first / last knot."""
    X_fit = np.linspace(-2, 10, 10)[:, None]

    transformer = SplineTransformer(
        degree=degree,
        extrapolation="periodic",
        knots=[[0.0], [1.0], [3.0], [4.0], [5.0], [8.0]]
    ).fit(X_fit)

    eps = 1e-8

    # We use scipy.misc.central_diff_weights to compute higher-order
    # derivatives of sufficient accuracy. See also
    # https://en.wikipedia.org/wiki/Finite_difference_coefficient
    weights = {
        d: central_diff_weights(2 * d + 1, d) for d in range(1, degree + 1)
    }

    for b in [0, 8, 16]:
        X_left = np.linspace(
            b - (2 * degree + 1) * eps,
            b - eps,
            2 * degree + 1
        )[:, None]
        X_right = np.linspace(
            b + eps,
            b + (2 * degree + 1) * eps,
            2 * degree + 1
        )[:, None]
        Xt_left = transformer.transform(X_left)
        Xt_right = transformer.transform(X_right)

        for d in range(1, degree + 1):
            dXt_left = np.matmul(
                Xt_left[(degree - d) : (degree + d + 1), :].T,
                weights[d]
            ) / (eps ** d)
            dXt_right = np.matmul(
                Xt_right[(degree - d) : (degree + d + 1), :].T,
                weights[d]
            ) / (eps ** d)
            assert_allclose(dXt_left, dXt_right, rtol = 0, atol=10)

ogrisel · 2021-03-17T11:30:09Z

I agree let's merge. Exploring how to best test derivatives in scikit-learn in general can be done in another PR.

lorentzenchr

LGTM. @mlondschien Thank you so much. This is a very useful extension. And sorry for being such a p.i.t.a. about the continuity test.

lorentzenchr · 2021-03-17T12:45:00Z

@mlondschien Also thanks for being frank and patient.

ogrisel · 2021-03-17T13:11:24Z

Thank you very much for the very nice contribution @mlondschien!

mlondschien added 2 commits February 18, 2021 10:06

Periodic splines.

6921060

Add test.

ca61d78

github-actions bot added the module:preprocessing label Feb 18, 2021

mlondschien changed the title ~~Splines periodic~~ Add periodic extrapolation to SplineTransformer Feb 18, 2021

mlondschien added 4 commits February 18, 2021 11:43

Flake8.

84d8813

Larger intercept, less knots in test.

000e5a9

Do extrapolation by hand for scipy<1.6.1

a6e79e2

Remove breakpoint.

9bae0ec

ogrisel reviewed Feb 19, 2021

View reviewed changes

sklearn/preprocessing/_polynomial.py Show resolved Hide resolved

ogrisel reviewed Feb 19, 2021

View reviewed changes

sklearn/preprocessing/tests/test_polynomial.py Outdated Show resolved Hide resolved

mlondschien and others added 6 commits February 19, 2021 11:42

Update sklearn/preprocessing/tests/test_polynomial.py

3d84b15

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Avoid inplace operation, raise for too few knots.

3567cf4

Test for ValueError.

365be53

Merge branch 'splines-periodic' of github.com:mlondschien/scikit-lear…

60bca96

…n into splines-periodic

Flake8.

8d8a219

Example.

1043777

mlondschien requested a review from ogrisel February 22, 2021 10:30

lorentzenchr reviewed Feb 26, 2021

View reviewed changes

mlondschien and others added 2 commits February 26, 2021 19:12

Apply suggestions from code review

b79a23c

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Implement suggestions by @lorentzenchr.

3ac646d

mlondschien mentioned this pull request Feb 26, 2021

FIX bug in SplineTransformer.n_features_out_ #19577

Merged

mlondschien added 3 commits March 2, 2021 09:19

Use np.diff, test at zero and add comments to smoothness test.

1bce730

Expand example text.

2965aa6

Add test.

52c7a43

mlondschien mentioned this pull request Mar 2, 2021

SplineTransformer raises when passing knots explicity for n_knots=None #19598

Closed

mlondschien and others added 2 commits March 2, 2021 11:21

Use sklearn.fixes.linspace.

4d1d14f

Improve the spline smoothness test

ca92728

mlondschien commented Mar 2, 2021

View reviewed changes

Add periodic splines to unity decomposition test.

82f3884

flake8.

287bb22

ogrisel reviewed Mar 12, 2021

View reviewed changes

sklearn/preprocessing/_polynomial.py Show resolved Hide resolved

Update sklearn/preprocessing/_polynomial.py

62b778a

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Make tol dependent of delta.

bd41fe8

lorentzenchr approved these changes Mar 17, 2021

View reviewed changes

lorentzenchr changed the title ~~Add periodic extrapolation to SplineTransformer~~ ENH Add periodic extrapolation to SplineTransformer Mar 17, 2021

lorentzenchr merged commit 95b7c68 into scikit-learn:main Mar 17, 2021

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add periodic extrapolation to SplineTransformer #19483

ENH Add periodic extrapolation to SplineTransformer #19483

mlondschien commented Feb 18, 2021

ogrisel commented Feb 19, 2021

ogrisel commented Feb 19, 2021

mlondschien commented Feb 19, 2021 •

edited

Loading

ogrisel commented Feb 19, 2021

ogrisel commented Feb 19, 2021 •

edited

Loading

mlondschien commented Feb 21, 2021

mlondschien commented Feb 25, 2021

lorentzenchr commented Feb 25, 2021

lorentzenchr left a comment

mlondschien commented Feb 26, 2021

lorentzenchr commented Mar 1, 2021

mlondschien commented Mar 2, 2021 •

edited

Loading

ogrisel commented Mar 2, 2021

mlondschien Mar 2, 2021 •

edited

Loading

ogrisel Mar 2, 2021

mlondschien Mar 2, 2021

mlondschien commented Mar 7, 2021

lorentzenchr commented Mar 9, 2021

mlondschien commented Mar 10, 2021

mlondschien commented Mar 10, 2021

lorentzenchr commented Mar 12, 2021

mlondschien commented Mar 16, 2021

lorentzenchr commented Mar 16, 2021

mlondschien commented Mar 17, 2021

ogrisel commented Mar 17, 2021

lorentzenchr left a comment

lorentzenchr commented Mar 17, 2021 •

edited

Loading

ogrisel commented Mar 17, 2021

ENH Add periodic extrapolation to SplineTransformer #19483

ENH Add periodic extrapolation to SplineTransformer #19483

Conversation

mlondschien commented Feb 18, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

ogrisel commented Feb 19, 2021

ogrisel commented Feb 19, 2021

mlondschien commented Feb 19, 2021 • edited Loading

ogrisel commented Feb 19, 2021

ogrisel commented Feb 19, 2021 • edited Loading

mlondschien commented Feb 21, 2021

mlondschien commented Feb 25, 2021

lorentzenchr commented Feb 25, 2021

lorentzenchr left a comment

Choose a reason for hiding this comment

mlondschien commented Feb 26, 2021

lorentzenchr commented Mar 1, 2021

mlondschien commented Mar 2, 2021 • edited Loading

ogrisel commented Mar 2, 2021

mlondschien Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

ogrisel Mar 2, 2021

Choose a reason for hiding this comment

mlondschien Mar 2, 2021

Choose a reason for hiding this comment

mlondschien commented Mar 7, 2021

lorentzenchr commented Mar 9, 2021

mlondschien commented Mar 10, 2021

mlondschien commented Mar 10, 2021

lorentzenchr commented Mar 12, 2021

mlondschien commented Mar 16, 2021

lorentzenchr commented Mar 16, 2021

mlondschien commented Mar 17, 2021

ogrisel commented Mar 17, 2021

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr commented Mar 17, 2021 • edited Loading

ogrisel commented Mar 17, 2021

mlondschien commented Feb 19, 2021 •

edited

Loading

ogrisel commented Feb 19, 2021 •

edited

Loading

mlondschien commented Mar 2, 2021 •

edited

Loading

mlondschien Mar 2, 2021 •

edited

Loading

lorentzenchr commented Mar 17, 2021 •

edited

Loading