Unclear behavior of max_train_size argument in TimeSeriesSplit

#### Description

So I am trying to understand the behavior of [`TimeSeriesSplit`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html#sklearn.model_selection.TimeSeriesSplit). Especially the `max_train_size` parameter. I was initially surprised that it is an absolute number and not a ratio like it is in other splitting operations.

I traced this parameter to issue #8249 and PR #8282 and I realized that it was added to support window-based splitting, as it is described [here](https://topepo.github.io/caret/data-splitting.html#data-splitting-for-time-series). This was very surprising for me because this is not really clear from documentation that this is happening. Moreover, I found parameters `initialWindow`, `horizon`, and `fixedWindow` much easier to understand, especially with that image.

I would suggest that:
* Documentation is improved here. Such visualization as shown in https://topepo.github.io/caret/data-splitting.html#data-splitting-for-time-series would really help a lot.
* We consider using or sample based parameters, like `initialWindow`, `horizon`, and `fixedWindow`, or ratio/fold based ones, but not both, because it is very confusing.

If we have splitting done by number of folds (which I prefer because it makes things adapt to different dataset sizes automatically), then also window size should be expressed in folds. In a way, parameters could then be:
* How many folds to do.
* Number of folds used in horizon, i.e., used in test data. It looks like this is currently fixed to 1 in this splitting operation and cannot really be configured. I suggest we allow this to be configured.
* Number of folds used in the window, i.e., training data. Default could be None, which would mean a non-fixed window and would mean to use all folds before the test data. Or you could fix it to get a sliding window.

#### Versions

Relates to how it is in sklearn v0.20.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unclear behavior of max_train_size argument in TimeSeriesSplit #13666

Description

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Unclear behavior of max_train_size argument in TimeSeriesSplit #13666

Description

Description

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions