Skip to content

ENH Add periodic extrapolation to SplineTransformer #19483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
Mar 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
6921060
Periodic splines.
mlondschien Feb 18, 2021
ca61d78
Add test.
mlondschien Feb 18, 2021
84d8813
Flake8.
mlondschien Feb 18, 2021
000e5a9
Larger intercept, less knots in test.
mlondschien Feb 18, 2021
a6e79e2
Do extrapolation by hand for scipy<1.6.1
mlondschien Feb 19, 2021
9bae0ec
Remove breakpoint.
mlondschien Feb 19, 2021
3d84b15
Update sklearn/preprocessing/tests/test_polynomial.py
mlondschien Feb 19, 2021
3567cf4
Avoid inplace operation, raise for too few knots.
mlondschien Feb 20, 2021
365be53
Test for ValueError.
mlondschien Feb 20, 2021
60bca96
Merge branch 'splines-periodic' of github.com:mlondschien/scikit-lear…
mlondschien Feb 20, 2021
8d8a219
Flake8.
mlondschien Feb 20, 2021
1043777
Example.
mlondschien Feb 21, 2021
b79a23c
Apply suggestions from code review
mlondschien Feb 26, 2021
3ac646d
Implement suggestions by @lorentzenchr.
mlondschien Feb 26, 2021
6925f3b
Extend test range in example
ogrisel Feb 26, 2021
175f4e4
Add test.
mlondschien Feb 27, 2021
a2a716b
Add to changelog and example.
mlondschien Feb 27, 2021
23a1d92
Merge branch 'splines-periodic' of github.com:mlondschien/scikit-lear…
mlondschien Feb 27, 2021
84e52a0
Repeat base_knot distances. (Thanks @lorentzenchr).
mlondschien Feb 27, 2021
7bfe5c6
Always use floats as knots.
mlondschien Feb 27, 2021
92ec907
Add test for smoothness.
mlondschien Feb 27, 2021
656763b
Add suggestions by @ogrisel.
mlondschien Feb 27, 2021
e250f2a
Merge branch 'main' into splines-periodic
mlondschien Feb 27, 2021
9383233
Expand explanation of periodic splines in docstring.
mlondschien Feb 27, 2021
a4d9a89
Some fixes and linting.
mlondschien Feb 27, 2021
042620e
Cleanup.
mlondschien Feb 27, 2021
b8da79f
Avoid scipy ValueError for integer knots.
mlondschien Feb 27, 2021
4aaf441
Update doc/whats_new/v1.0.rst
mlondschien Feb 28, 2021
a020493
np.float64.
mlondschien Feb 28, 2021
bb10f8d
More specific docstring.
mlondschien Feb 28, 2021
ffdc560
Add paragraph about naturally periodic features to example.
mlondschien Feb 28, 2021
8d54a1c
Apply suggestions from code review
mlondschien Feb 28, 2021
3fe9d65
Implement suggestions by @lorentzenchr.
mlondschien Feb 28, 2021
a8435c2
Merge branch 'splines-periodic' of github.com:mlondschien/scikit-lear…
mlondschien Feb 28, 2021
1438816
dXt.
mlondschien Feb 28, 2021
10e74c2
Revert and add test.
mlondschien Feb 28, 2021
425535a
Revert dtype casting to np.float64.
mlondschien Feb 28, 2021
b387600
Linting.
mlondschien Feb 28, 2021
163923b
Simplify test.
mlondschien Feb 28, 2021
39138b8
Add float64 dtype casting back in.
mlondschien Feb 28, 2021
9f1654b
Cover case with multiple features.
mlondschien Feb 28, 2021
52ec48c
Different tolerance for smoothness property.
mlondschien Feb 28, 2021
beb02fe
Pass np.float64 as dtype argument to check_array.
mlondschien Feb 28, 2021
49748b5
Fix test.
mlondschien Feb 28, 2021
1bce730
Use np.diff, test at zero and add comments to smoothness test.
mlondschien Mar 2, 2021
2965aa6
Expand example text.
mlondschien Mar 2, 2021
52c7a43
Add test.
mlondschien Mar 2, 2021
4d1d14f
Use sklearn.fixes.linspace.
mlondschien Mar 2, 2021
ca92728
Improve the spline smoothness test
ogrisel Mar 2, 2021
82f3884
Add periodic splines to unity decomposition test.
mlondschien Mar 10, 2021
287bb22
flake8.
mlondschien Mar 10, 2021
62b778a
Update sklearn/preprocessing/_polynomial.py
mlondschien Mar 16, 2021
bd41fe8
Make tol dependent of delta.
mlondschien Mar 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,9 @@ Changelog
polynomial ``degree`` of the splines, number of knots ``n_knots`` and knot
positioning strategy ``knots``.
:pr:`18368` by :user:`Christian Lorentzen <lorentzenchr>`.
:class:`preprocessing.SplineTransformer` also supports periodic
splines via the ``extrapolation`` argument.
:pr:`19483` by :user:`Malte Londschien <mlondschien>`.

- |Fix| :func:`preprocessing.scale`, :class:`preprocessing.StandardScaler`
and similar scalers detect near-constant features to avoid scaling them to
Expand Down
68 changes: 68 additions & 0 deletions examples/linear_model/plot_polynomial_interpolation.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
# Author: Mathieu Blondel
# Jake Vanderplas
# Christian Lorentzen
# Malte Londschien
# License: BSD 3 clause

import numpy as np
Expand Down Expand Up @@ -145,3 +146,70 @@ def f(x):
# function has local support and is continued as a constant beyond the fitted
# range. This extrapolating behaviour could be changed by the argument
# ``extrapolation``.

# %%
# Periodic Splines
# ----------------
# In the previous example we saw the limitations of polynomials and splines for
# extrapolation beyond the range of the training observations. In some
# settings, e.g. with seasonal effects, we expect a periodic continuation of
# the underlying signal. Such effects can be modelled using periodic splines,
# which have equal function value and equal derivatives at the first and last
# knot. In the following case we show how periodic splines provide a better fit
# both within and outside of the range of training data given the additional
# information of periodicity. The splines period is the distance between
# the first and last knot, which we specify manually.
#
# Periodic splines can also be useful for naturally periodic features (such as
# day of the year), as the smoothness at the boundary knots prevents a jump in
# the transformed values (e.g. from Dec 31st to Jan 1st). For such naturally
# periodic features or more generally features where the period is known, it is
# advised to explicitly pass this information to the `SplineTransformer` by
# setting the knots manually.


# %%
def g(x):
"""Function to be approximated by periodic spline interpolation."""
return np.sin(x) - 0.7 * np.cos(x * 3)


y_train = g(x_train)

# Extend the test data into the future:
x_plot_ext = np.linspace(-1, 21, 200)
X_plot_ext = x_plot_ext[:, np.newaxis]

lw = 2
fig, ax = plt.subplots()
ax.set_prop_cycle(color=["black", "tomato", "teal"])
ax.plot(x_plot_ext, g(x_plot_ext), linewidth=lw, label="ground truth")
ax.scatter(x_train, y_train, label="training points")

for transformer, label in [
(SplineTransformer(degree=3, n_knots=10), "spline"),
(SplineTransformer(
degree=3,
knots=np.linspace(0, 2 * np.pi, 10)[:, None],
extrapolation="periodic"
), "periodic spline")
]:
model = make_pipeline(transformer, Ridge(alpha=1e-3))
model.fit(X_train, y_train)
y_plot_ext = model.predict(X_plot_ext)
ax.plot(x_plot_ext, y_plot_ext, label=label)

ax.legend()
fig.show()

# %% We again plot the underlying splines.
fig, ax = plt.subplots()
knots = np.linspace(0, 2 * np.pi, 4)
splt = SplineTransformer(
knots=knots[:, None],
degree=3,
extrapolation="periodic"
).fit(X_train)
ax.plot(x_plot_ext, splt.transform(X_plot_ext))
ax.legend(ax.lines, [f"spline {n}" for n in range(3)])
plt.show()
135 changes: 95 additions & 40 deletions sklearn/preprocessing/_polynomial.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@

# TODO:
# - sparse support (either scipy or own cython solution)?
# - extrapolation (cyclic)
class SplineTransformer(TransformerMixin, BaseEstimator):
"""Generate univariate B-spline bases for features.

Generate a new feature matrix consisting of
`n_splines=n_knots + degree - 1` spline basis functions (B-splines) of
polynomial order=`degree` for each feature.
`n_splines=n_knots + degree - 1` (`n_knots - 1` for
`extrapolation="periodic"`) spline basis functions
(B-splines) of polynomial order=`degree` for each feature.

Read more in the :ref:`User Guide <spline_transformer>`.

Expand Down Expand Up @@ -54,14 +54,21 @@ class SplineTransformer(TransformerMixin, BaseEstimator):
`degree` number of knots are added before the first knot, the same
after the last knot.

extrapolation : {'error', 'constant', 'linear', 'continue'}, \
extrapolation : {'error', 'constant', 'linear', 'continue', 'periodic'}, \
default='constant'
If 'error', values outside the min and max values of the training
features raises a `ValueError`. If 'constant', the value of the
splines at minimum and maximum value of the features is used as
constant extrapolation. If 'linear', a linear extrapolation is used.
If 'continue', the splines are extrapolated as is, i.e. option
`extrapolate=True` in :class:`scipy.interpolate.BSpline`.
`extrapolate=True` in :class:`scipy.interpolate.BSpline`. If
'periodic', periodic splines with a periodicity equal to the distance
between the first and last knot are used. Periodic splines enforce
equal function values and derivatives at the first and last knot.
For example, this makes it possible to avoid introducing an arbitrary
jump between Dec 31st and Jan 1st in spline features derived from a
naturally periodic "day-of-year" input feature. In this case it is
recommended to manually set the knot values to control the period.

include_bias : bool, default=True
If True (default), then the last spline element inside the data range
Expand All @@ -84,7 +91,9 @@ class SplineTransformer(TransformerMixin, BaseEstimator):
n_features_out_ : int
The total number of output features, which is computed as
`n_features * n_splines`, where `n_splines` is
the number of bases elements of the B-splines, `n_knots + degree - 1`.
the number of bases elements of the B-splines,
`n_knots + degree - 1` for non-periodic splines and
`n_knots - 1` for periodic ones.
If `include_bias=False`, then it is only
`n_features * (n_splines - 1)`.

Expand Down Expand Up @@ -235,7 +244,7 @@ def fit(self, X, y=None):
X, n_knots=self.n_knots, knots=self.knots
)
else:
base_knots = check_array(self.knots)
base_knots = check_array(self.knots, dtype=np.float64)
if base_knots.shape[0] < 2:
raise ValueError(
"Number of knots, knots.shape[0], must be >= " "2."
Expand All @@ -250,55 +259,86 @@ def fit(self, X, y=None):
"constant",
"linear",
"continue",
"periodic",
):
raise ValueError(
"extrapolation must be one of 'error', "
"'constant', 'linear' or 'continue'."
"'constant', 'linear', 'continue' or 'periodic'."
)

if not isinstance(self.include_bias, (bool, np.bool_)):
raise ValueError("include_bias must be bool.")

# number of knots for base interval
n_knots = base_knots.shape[0]

if self.extrapolation == "periodic" and n_knots <= self.degree:
raise ValueError(
"Periodic splines require degree < n_knots. Got n_knots="
f"{n_knots} and degree={self.degree}."
)

# number of splines basis functions
n_splines = n_knots + self.degree - 1
if self.extrapolation != "periodic":
n_splines = n_knots + self.degree - 1
else:
# periodic splines have self.degree less degrees of freedom
n_splines = n_knots - 1

degree = self.degree
n_out = n_features * n_splines
# We have to add degree number of knots below, and degree number knots
# above the base knots in order to make the spline basis complete.
# Eilers & Marx in "Flexible smoothing with B-splines and penalties"
# https://doi.org/10.1214/ss/1038425655 advice against repeating first
# and last knot several times, which would have inferior behaviour at
# boundaries if combined with a penalty (hence P-Spline). We follow
# this advice even if our splines are unpenalized.
# Meaning we do not:
# knots = np.r_[np.tile(base_knots.min(axis=0), reps=[degree, 1]),
# base_knots,
# np.tile(base_knots.max(axis=0), reps=[degree, 1])
# ]
# Instead, we reuse the distance of the 2 fist/last knots.
dist_min = base_knots[1] - base_knots[0]
dist_max = base_knots[-1] - base_knots[-2]
knots = np.r_[
linspace(
base_knots[0] - degree * dist_min,
base_knots[0] - dist_min,
num=degree,
),
base_knots,
linspace(
base_knots[-1] + dist_max,
base_knots[-1] + degree * dist_max,
num=degree,
),
]
if self.extrapolation == "periodic":
# For periodic splines the spacing of the first / last degree knots
# needs to be a continuation of the spacing of the last / first
# base knots.
period = base_knots[-1] - base_knots[0]
knots = np.r_[
base_knots[-(degree + 1): -1] - period,
base_knots,
base_knots[1: (degree + 1)] + period
]

else:
# Eilers & Marx in "Flexible smoothing with B-splines and
# penalties" https://doi.org/10.1214/ss/1038425655 advice
# against repeating first and last knot several times, which
# would have inferior behaviour at boundaries if combined with
# a penalty (hence P-Spline). We follow this advice even if our
# splines are unpenalized. Meaning we do not:
# knots = np.r_[
# np.tile(base_knots.min(axis=0), reps=[degree, 1]),
# base_knots,
# np.tile(base_knots.max(axis=0), reps=[degree, 1])
# ]
# Instead, we reuse the distance of the 2 fist/last knots.
dist_min = base_knots[1] - base_knots[0]
dist_max = base_knots[-1] - base_knots[-2]

knots = np.r_[
linspace(
base_knots[0] - degree * dist_min,
base_knots[0] - dist_min,
num=degree,
),
base_knots,
linspace(
base_knots[-1] + dist_max,
base_knots[-1] + degree * dist_max,
num=degree,
),
]

# With a diagonal coefficient matrix, we get back the spline basis
# elements, i.e. the design matrix of the spline.
# Note, BSpline appreciates C-contiguous float64 arrays as c=coef.
coef = np.eye(n_knots + self.degree - 1, dtype=np.float64)
extrapolate = self.extrapolation == "continue"
coef = np.eye(n_splines, dtype=np.float64)
if self.extrapolation == "periodic":
coef = np.concatenate((coef, coef[:degree, :]))

extrapolate = self.extrapolation in ["periodic", "continue"]

bsplines = [
BSpline.construct_fast(
knots[:, i], coef, self.degree, extrapolate=extrapolate
Expand Down Expand Up @@ -331,7 +371,7 @@ def transform(self, X):
)

n_samples, n_features = X.shape
n_splines = self.bsplines_[0].c.shape[0]
n_splines = self.bsplines_[0].c.shape[1]
degree = self.degree

# Note that scipy BSpline returns float64 arrays and converts input
Expand All @@ -346,8 +386,23 @@ def transform(self, X):
for i in range(n_features):
spl = self.bsplines_[i]

if self.extrapolation in ("continue", "error"):
XBS[:, (i * n_splines):((i + 1) * n_splines)] = spl(X[:, i])
if self.extrapolation in ("continue", "error", "periodic"):

if self.extrapolation == "periodic":
# With periodic extrapolation we map x to the segment
# [spl.t[k], spl.t[n]].
# This is equivalent to BSpline(.., extrapolate="periodic")
# for scipy>=1.0.0.
n = spl.t.size - spl.k - 1
# Assign to new array to avoid inplace operation
x = spl.t[spl.k] + (X[:, i] - spl.t[spl.k]) % (
spl.t[n] - spl.t[spl.k]
)
else:
x = X[:, i]

XBS[:, (i * n_splines):((i + 1) * n_splines)] = spl(x)

else:
xmin = spl.t[degree]
xmax = spl.t[-degree - 1]
Expand Down
Loading