@@ -464,7 +464,7 @@ In this case we would like to know if a model trained on a particular set of
464
464
groups generalizes well to the unseen groups. To measure this, we need to
465
465
ensure that all the samples in the validation fold come from groups that are
466
466
not represented at all in the paired training fold.
467
-
467
+
468
468
The following cross-validation splitters can be used to do that.
469
469
The grouping identifier for the samples is specified via the ``groups ``
470
470
parameter.
@@ -601,29 +601,29 @@ samples that are part of the validation set, and to -1 for all other samples.
601
601
Cross validation of time series data
602
602
====================================
603
603
604
- Time series data is characterised by the correlation between observations
605
- that are near in time (*autocorrelation *). However, classical
606
- cross-validation techniques such as :class: `KFold ` and
607
- :class: `ShuffleSplit ` assume the samples are independent and
608
- identically distributed, and would result in unreasonable correlation
609
- between training and testing instances (yielding poor estimates of
610
- generalisation error) on time series data. Therefore, it is very important
611
- to evaluate our model for time series data on the "future" observations
612
- least like those that are used to train the model. To achieve this, one
604
+ Time series data is characterised by the correlation between observations
605
+ that are near in time (*autocorrelation *). However, classical
606
+ cross-validation techniques such as :class: `KFold ` and
607
+ :class: `ShuffleSplit ` assume the samples are independent and
608
+ identically distributed, and would result in unreasonable correlation
609
+ between training and testing instances (yielding poor estimates of
610
+ generalisation error) on time series data. Therefore, it is very important
611
+ to evaluate our model for time series data on the "future" observations
612
+ least like those that are used to train the model. To achieve this, one
613
613
solution is provided by :class: `TimeSeriesSplit `.
614
614
615
615
616
616
Time Series Split
617
617
-----------------
618
618
619
- :class: `TimeSeriesSplit ` is a variation of *k-fold * which
620
- returns first :math: `k` folds as train set and the :math: `(k+1 )` th
621
- fold as test set. Note that unlike standard cross-validation methods,
619
+ :class: `TimeSeriesSplit ` is a variation of *k-fold * which
620
+ returns first :math: `k` folds as train set and the :math: `(k+1 )` th
621
+ fold as test set. Note that unlike standard cross-validation methods,
622
622
successive training sets are supersets of those that come before them.
623
623
Also, it adds all surplus data to the first training partition, which
624
624
is always used to train the model.
625
625
626
- This class can be used to cross-validate time series data samples
626
+ This class can be used to cross-validate time series data samples
627
627
that are observed at fixed time intervals.
628
628
629
629
Example of 3-split time series cross-validation on a dataset with 6 samples::
@@ -634,7 +634,7 @@ Example of 3-split time series cross-validation on a dataset with 6 samples::
634
634
>>> y = np.array([1, 2, 3, 4, 5, 6])
635
635
>>> tscv = TimeSeriesSplit(n_splits=3)
636
636
>>> print(tscv) # doctest: +NORMALIZE_WHITESPACE
637
- TimeSeriesSplit(n_splits=3)
637
+ TimeSeriesSplit(max_train_size=None, n_splits=3)
638
638
>>> for train, test in tscv.split(X):
639
639
... print("%s %s" % (train, test))
640
640
[0 1 2] [3]
0 commit comments