-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add periodic extrapolation to SplineTransformer #19483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add periodic extrapolation to SplineTransformer #19483
Conversation
If supporting old versions of scipy is too much of a hassle I think it's fine to raise |
Alternatively, if missing code in scipy is a rather self-contained pure-python function, feel free to backport it in |
The only code needed to support n = spl.t.size - spl.k - 1
X[:, i] = spl.t[spl.k] + (X[:, i] - spl.t[spl.k]) % (spl.t[n] - spl.t[spl.k]) IMO this is fine, but raising a |
Alright, I am fine with keeping this at it is then. |
Could you please extend the spline example with a new section on modeling periodic data (with synthetic data with a mix of yearly and weekly signals)? You can use pandas to generate such fake data. Note: as I have no experience with using splines myself, that would help me review the PR by seeing how to use this new feature correctly and assessing its correctness. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…n into splines-periodic
I've added the test and an example. I used a synthetic example as I believed that this better visualized how periodic splines work. I am open for other suggestions though. How do I check whether the example renders correctly? |
Since you implemented the transformer, it would be very helpful if you could have a look @lorentzenchr. |
@mlondschien I will. Thanks for working on this extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the test side, I miss a test that the extrapolation works. Let's say period length is 1. Fitting is on x in [0, 1] then transform([0, 1]) == transform([1, 2]) and in particular at the knots.
Maybe, one should more clearly state, that the periodicity is given either by the manually specified knots or by the range of the training data. It is neither automatically detected, nor can you specify the periodicity directly (with a parameter).
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Thanks @lorentzenchr for having a look.
This is done in this test, even if not as explicitly as you are suggesting. However this (and the test you suggested) would pass for any periodic continuation of splines (e.g. normal B-Splines with
I have tried to make this more explicit in the example. Where / how would you specify this in the docstring? One could add
to the Note that this statement is no longer correct. Would you like to point out the different number of splines for periodic splines or is this a minor detail? |
Periodic b-splines turn out to be surprisingly complicated. But we're close to the finish. |
I've expanded the example to suggest users to provide knots explicitly if the periodicity is known. I've also tried to make the smoothness test more self-explanatory.
This makes sense. I would have expected the tests to never finish (i.e. yellow dot) instead of failing though.
Sounds reasonable. I've added such a test.
Thanks for investing the time to review. |
I pushed ca92728 to improve |
# Check continuity of the (d-1)-th derivative | ||
assert (np.abs(dXt) < tol).all() | ||
diff = np.diff(dXt, axis=0) | ||
assert np.abs(diff).max() < tol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this (and below) depends on tol / delta ~ 10
. I.e. if we reduce (or increase) the number of rows in X
substantially, this test will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why though. Continuity should always be checked on the raw diff, assuming that the steps are fine grained enough, no?
Otherwise we "check continuity" by bounding the absolute value of the derivative itself which seems wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given continuity, the "raw diff" scales with the order of the stepsize (delta
). It will be roughly of the same order. Above, your stepsize is ~1e-3, so checking against tol=1e-2
is fine. If we reduce the stepsize to 0.1, the diff will be of order diff ~ 0.1 > tol = 1e-2
.
@lorentzenchr, if you could have another (final 😉?) look, that would be great. |
@mlondschien This PR looks very good and, if necessary, we can improve some minor things in later PRs. I have 2 remaining issues, a small one:
And a large one: eps = 1e-8
for b in [0, 8, 16]:
Xnew = np.linspace(b - degree*eps , b + degree*eps, num=2*degree + 1)[:, None]
Xt = transformer.fit_transform(Xnew).squeeze()
assert Xt[degree-1] = pytest.approx(Xt[degree+1]) # compare at b-eps and b+eps
# TODO: compare left n-th derivative at b-eps with right n-th derivative at b+eps What do you think? |
Sure!
IMO the test from before, I.e. with |
I have adjusted (but not pushed) the smoothness test as proposed by you @lorentzenchr. See here: @pytest.mark.parametrize("degree", [3, 5])
def test_spline_transformer_periodic_splines_smoothness(degree):
"""Test that spline transformation is smooth at first / last knot."""
X = np.linspace(0, 8, 10)[:, None]
transformer = SplineTransformer(
degree=degree,
extrapolation="periodic",
knots=[[0.0], [1.0], [3.0], [4.0], [5.0], [8.0]]
)
Xt = transformer.fit(X)
eps = 1e-8
for b in [0, 8, 16]:
dXt = transformer.transform(
np.linspace(
b - degree * eps,
b + degree * eps,
2 * degree + 1
)[:, None]
)
for d in range(degree):
assert_allclose(dXt[0, :], dXt[-1, :], atol = 10 * eps, rtol=0)
dXt = np.diff(dXt, axis=0) / eps This fails due to problems with precision. The original Possible options:
Let me know what you think @lorentzenchr. If you had something different in mind that does not have this problem, please let me know. I don't believe that we should hold off on merging this PR due to details about tolerances / precision. Feel free to add any changes you would like to the smoothness test onto this PR. |
What about looking at |
I could not find an implementation for forward / backward diff weights in scipy. Are you fine with |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
It's not my intention to make this PR more complex. If you could use https://en.wikipedia.org/wiki/Finite_difference_coefficient, that'd be great. Using |
To be frank, IMO its too late for this. I have probably spent as much time on the smoothness test as on the periodic splines themselves.
How would you go about using the coefficients from Wikipedia? Just copy them over? I have tried using (central) finite differences as suggested by you. See the code in the details below. The test still fails for high degrees / small tolerances. IMO this is still due to precision. According to Wikipedia, the See the following example:
Note how the third derivative of I believe that hunting higher tolerances is useless. IMO the test as I had implemented it originally or as it is currently is sufficient for the use-case. I have added a small commit onto @ogrisel's suggestion to make the tolerance dependent of I sincerely hope that this test does not prevent us from merging the PR. Periodic splines are a feature that improves on the status quo. The current tests are much more extensive than those before, e.g. as there are no tests for continuity / differentiability at the border for constant / linear extrapolation. @pytest.mark.parametrize("degree", [3, 5])
def test_spline_transformer_periodic_splines_smoothness(degree):
"""Test that spline transformation is smooth at first / last knot."""
X_fit = np.linspace(-2, 10, 10)[:, None]
transformer = SplineTransformer(
degree=degree,
extrapolation="periodic",
knots=[[0.0], [1.0], [3.0], [4.0], [5.0], [8.0]]
).fit(X_fit)
eps = 1e-8
# We use scipy.misc.central_diff_weights to compute higher-order
# derivatives of sufficient accuracy. See also
# https://en.wikipedia.org/wiki/Finite_difference_coefficient
weights = {
d: central_diff_weights(2 * d + 1, d) for d in range(1, degree + 1)
}
for b in [0, 8, 16]:
X_left = np.linspace(
b - (2 * degree + 1) * eps,
b - eps,
2 * degree + 1
)[:, None]
X_right = np.linspace(
b + eps,
b + (2 * degree + 1) * eps,
2 * degree + 1
)[:, None]
Xt_left = transformer.transform(X_left)
Xt_right = transformer.transform(X_right)
for d in range(1, degree + 1):
dXt_left = np.matmul(
Xt_left[(degree - d) : (degree + d + 1), :].T,
weights[d]
) / (eps ** d)
dXt_right = np.matmul(
Xt_right[(degree - d) : (degree + d + 1), :].T,
weights[d]
) / (eps ** d)
assert_allclose(dXt_left, dXt_right, rtol = 0, atol=10) |
I agree let's merge. Exploring how to best test derivatives in scikit-learn in general can be done in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @mlondschien Thank you so much. This is a very useful extension. And sorry for being such a p.i.t.a. about the continuity test.
@mlondschien Also thanks for being frank and patient. |
Thank you very much for the very nice contribution @mlondschien! |
Reference Issues/PRs
Extends #18368.
What does this implement/fix? Explain your changes.
This PR adds the option for periodic extrapolation (with periodic splines) to the
SplineTransformer
.Any other comments?
Thank you @lorentzenchr for adding the
SplineTransformer
! I often use periodic splines e.g. to model effects for the day of the year and would believe this feature to be very helpful. Please let me know if we should expand documenting code / examples.