Skip to content

Commit 635d83f

Browse files
authored
DOC Add note on overlapping test sets in GroupShuffleSplit (#29676)
1 parent 76bed22 commit 635d83f

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

sklearn/model_selection/_split.py

+14-7
Original file line numberDiff line numberDiff line change
@@ -1932,8 +1932,9 @@ class ShuffleSplit(_UnsupportedGroupCVMixin, BaseShuffleSplit):
19321932
Yields indices to split data into training and test sets.
19331933
19341934
Note: contrary to other cross-validation strategies, random splits
1935-
do not guarantee that all folds will be different, although this is
1936-
still very likely for sizeable datasets.
1935+
do not guarantee that test sets across all folds will be mutually exclusive,
1936+
and might include overlapping samples. However, this is still very likely for
1937+
sizeable datasets.
19371938
19381939
Read more in the :ref:`User Guide <ShuffleSplit>`.
19391940
@@ -2049,6 +2050,11 @@ class GroupShuffleSplit(GroupsConsumerMixin, BaseShuffleSplit):
20492050
``LeavePGroupsOut(p=10)`` would be
20502051
``GroupShuffleSplit(test_size=10, n_splits=100)``.
20512052
2053+
Contrary to other cross-validation strategies, the random splits
2054+
do not guarantee that test sets across all folds will be mutually exclusive,
2055+
and might include overlapping samples. However, this is still very likely for
2056+
sizeable datasets.
2057+
20522058
Note: The parameters ``test_size`` and ``train_size`` refer to groups, and
20532059
not to samples as in :class:`ShuffleSplit`.
20542060
@@ -2176,13 +2182,14 @@ class StratifiedShuffleSplit(BaseShuffleSplit):
21762182
21772183
Provides train/test indices to split data in train/test sets.
21782184
2179-
This cross-validation object is a merge of StratifiedKFold and
2180-
ShuffleSplit, which returns stratified randomized folds. The folds
2185+
This cross-validation object is a merge of :class:`StratifiedKFold` and
2186+
:class:`ShuffleSplit`, which returns stratified randomized folds. The folds
21812187
are made by preserving the percentage of samples for each class.
21822188
2183-
Note: like the ShuffleSplit strategy, stratified random splits
2184-
do not guarantee that all folds will be different, although this is
2185-
still very likely for sizeable datasets.
2189+
Note: like the :class:`ShuffleSplit` strategy, stratified random splits
2190+
do not guarantee that test sets across all folds will be mutually exclusive,
2191+
and might include overlapping samples. However, this is still very likely for
2192+
sizeable datasets.
21862193
21872194
Read more in the :ref:`User Guide <stratified_shuffle_split>`.
21882195

0 commit comments

Comments
 (0)