-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold #8120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold #8120
Conversation
Do you intend to close #7960 |
Yes. |
doc/modules/cross_validation.rst
Outdated
>>> from sklearn.model_selection import RepeatedKFold, KFold | ||
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) | ||
>>> random_states = [12883823, 28827347] | ||
>>> rkf = RepeatedKFold(KFold(n_splits=2), n_repeats=2, random_states=random_states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a solution that can accept a single random_state
. It can generate random states for each KFold instance.
I also don't get why you should accept a KFold
instance to the constructor of RepeatedKFold
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be better to accept n_splits
to the constructor of RepeatedKFold
and create an instance of KFold
inside. Likewise for RepeatedStratifiedKFold
.
I'll make the changes.
…eats, random_state
sklearn/model_selection/_split.py
Outdated
if n_repeats <= 1: | ||
raise ValueError("Number of repetitions must be greater than 1.") | ||
|
||
rng = check_random_state(random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not certain it's the best design, but currently all the splitters do this in split
, not in __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other way to achieve this other than initializing self.random_states = []
in __init__
and generate random states when split
is called for the first time? The code 946-950 will move to split
inside an if
condition?
you don't need to store the random states, just generate them from the
initial random state in split.
…On 28 December 2016 at 17:06, Neeraj Gangwar ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/model_selection/_split.py
<#8120>:
> + random_state : None, int or RandomState, default=None
+ Random state to be used to generate random state for each
+ repetition.
+ """
+ def __init__(self, cv, n_repeats=5, random_state=None):
+ if not isinstance(cv, (KFold, StratifiedKFold)):
+ raise ValueError(
+ "cv must be an instance of KFold or StratifiedKFold.")
+
+ if not isinstance(n_repeats, (np.integer, numbers.Integral)):
+ raise ValueError("Number of repetitions must be of Integral type.")
+
+ if n_repeats <= 1:
+ raise ValueError("Number of repetitions must be greater than 1.")
+
+ rng = check_random_state(random_state)
Is there any other way to achieve this other than initializing self.random_states
= [] in __init__ and generate random states when split is called for the
first time? The code 946-950 will move to split inside an if condition?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8120>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz61GmYoTB4NCB-HyVXSsgCpyWImAsks5rMfxwgaJpZM4LWF1S>
.
|
Do you mean initialize Or are you referring to some other way? |
Currently KFold with shuffle=True will generate different splits on
different calls to split, will it not? This should behave the same.
…On 29 December 2016 at 14:54, Neeraj Gangwar ***@***.***> wrote:
Do you mean initialize RandomState in each split call with
check_random_state and generate random states? In this case, if initial
random_state is int, it will work fine as check_random_state will return
RandomState with the same initial seed on every call. But if it's None,
it will return RandomState with different seed on every call and if it's
RandomState, it'll return the same object. In both of these cases, split
will produce different splits on different calls. To generate same splits
on different split calls, initial state needs to be stored somewhere
probably?
Or are you referring to some other way?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8120 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67TFHjdy2Vo5fEV2HOdLZ_VqDUOpks5rMy7lgaJpZM4LWF1S>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
doc/modules/cross_validation.rst
Outdated
[1 2] [0 3] | ||
[0 3] [1 2] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just mention RepeatStratifiedKFold here and under StratifiedKFold. Also in the "see also"s of relevant classes.
doc/modules/cross_validation.rst
Outdated
@@ -409,6 +432,30 @@ two slightly unbalanced classes:: | |||
[0 1 3 4 5 8 9] [2 6 7] | |||
[0 1 2 4 5 6 7] [3 8 9] | |||
|
|||
Repeated Stratified K-Fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. I think this is overkill
sklearn/model_selection/_split.py
Outdated
@@ -913,6 +915,238 @@ def get_n_splits(self, X, y, groups): | |||
return int(comb(len(np.unique(groups)), self.n_groups, exact=True)) | |||
|
|||
|
|||
class _RepeatedSplits(with_metaclass(ABCMeta)): | |||
"""Repeated splits for K-Fold and Stratified K-Fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for an arbitrary randomized CV splitter"
sklearn/model_selection/_split.py
Outdated
class _RepeatedSplits(with_metaclass(ABCMeta)): | ||
"""Repeated splits for K-Fold and Stratified K-Fold | ||
|
||
Repeats splits for cross-validators n times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with different randomization
sklearn/model_selection/_split.py
Outdated
return self._repeated_splits.get_n_repeats() | ||
|
||
|
||
class RepeatedStratifiedKFold(with_metaclass(ABCMeta)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an ABC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing.
sklearn/model_selection/_split.py
Outdated
def __init__(self, cv, n_repeats=5, random_state=None): | ||
if not isinstance(cv, (KFold, StratifiedKFold)): | ||
raise ValueError( | ||
"cv must be an instance of KFold or StratifiedKFold.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only KFold
and StratifiedKFold
use random_state
. That's why I added this check. Should I remove it? And also, is there a way to check if cv
is an instance of cross-validator with randomized split functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a private class, it doesn't require such validation.
sklearn/model_selection/_split.py
Outdated
for train_index, test_index in cv.split(X, y, groups): | ||
yield train_index, test_index | ||
|
||
def get_n_repeats(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. n_repeats is already an attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing from all classes.
sklearn/model_selection/_split.py
Outdated
test : ndarray | ||
The testing set indices for that split. | ||
""" | ||
cv = self.cv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead we should be constructing new CV objects in here (i.e. in split
). Thus _RepeatedSplits.__init__
would take a constructor for cv
rather than cv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice as it would remove the dependency of KFold
from RepeatedKFold
in terms of parameters. I am thinking something like def __init__(self, cv, n_repeats=5, random_state=None, **cvargs):
. It will be called as _RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits)
. This is what you meant, right?
I have one doubt though. Since shuffle
should always be True
and random_state
will be generated inside split
function, would it be okay to just mention that user should not pass these arguments and one random_state
that is passed does not correspond to random_state
parameter of KFold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have **cvargs
in the _RepeatedSplits
class while still only allowing specified named args in RepeatedKFold
train, test = next(splits) | ||
assert_array_equal(train, [0, 1, 2]) | ||
assert_array_equal(test, [3, 4]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also check that a second call to split
produces the same sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also don't check here that the iterator is exhausted after 4 elements
|
||
def test_repeated_stratified_kfold_errors(): | ||
# n_repeats is not integer or <= 1 | ||
assert_raises(ValueError, RepeatedStratifiedKFold, n_repeats=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in a loop together with the RepeatedKFold
case.
…RepeatedSplits and other review changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking much better, thanks!
sklearn/model_selection/_split.py
Outdated
See also | ||
-------- | ||
RepeatedStratifiedKFold: Repeats Stratified K-Fold n times. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove blank line
train, test = next(splits) | ||
assert_array_equal(train, [0, 1, 2]) | ||
assert_array_equal(test, [3, 4]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.
train, test = next(splits) | ||
assert_array_equal(train, [0, 1, 2]) | ||
assert_array_equal(test, [3, 4]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also don't check here that the iterator is exhausted after 4 elements
…add StopIteration check in testcase
Thanks @jnothman for the review. And a very happy new year :) |
LGTM! |
raise ValueError("Number of repetitions must be of Integral type.") | ||
|
||
if n_repeats <= 1: | ||
raise ValueError("Number of repetitions must be greater than 1.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never check values in __init__
. Move it to split
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't error be thrown at the construction time if there is some discrepancy with the parameters passed? In _BaseKFold
also, values are checked in __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In sklearn for the estimator, we never check the error in init
because of set_params
but these classes are not estimators. I imagine this rule is not applied here.
@jnothman As I'm not 100% sure can you confirm that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, at least, CV splitters are a bit special in this regard. Checking in __init__
is consistent with other splitters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thx @jnothman
|
||
**cvargs : additional params | ||
Constructor parameters for cv. Must not contain random_state | ||
and shuffle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an obligation as _RepeatedSplits is private but can you raise an error in split to check that ?
sklearn/model_selection/_split.py
Outdated
rng = check_random_state(self.random_state) | ||
|
||
for idx in range(n_repeats): | ||
random_state = rng.randint(np.iinfo(np.int32).max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but why directly send rng
to random_state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer random state generated by rng
is sent as random_state
. Do you have any other way in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was not clear.
Can you remove the line random_state = rng.randint(np.iinfo(np.int32).max)
and change random_state
by rng
later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean after creating the object for cv
? If yes, how would it make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean : cv = self.cv(random_state=rng, shuffle=True, **self.cvargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code is similar to the code following :
In [1]: from sklearn.utils import check_random_state
In [2]: class Foo:
...: def __init__(self, random_state):
...: self.rng = check_random_state(random_state)
...:
...: def fit(self):
...: print(self.rng.randint(1000))
...:
In [3]: rng = check_random_state(0)
In [4]: f1 = Foo(rng)
In [5]: f2 = Foo(rng)
In [6]: f1.fit()
684
In [7]: f1.fit()
559
In [8]: f2.fit()
629
In [9]: f2.fit()
192
rng
is an object and it will be modified every time you call cv.split
. So for me is not necessary to generate a specific random_state
at each iteration. Maybe I am missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_random_state
does not create a copy of rng
if rng
is a random_state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any case which we'll miss by this approach. But I think current implementation keeps the use of random_state
clean. I am not really sure.
@jnothman thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing rng directly should be okay. If it's not okay, we need to be able to construct a test case that proves so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to find any such testcase. So making the changes. Thanks!
…ded a check for cvargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Circle seems unrelated.
sklearn/model_selection/_split.py
Outdated
|
||
Parameters | ||
---------- | ||
n_splits : int, default=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with the other estimators but seem pretty useless in practice. I would make 10 times 10-fold the default. Why would you want to do 3x5 instead of 10 fold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it to 5x10 (n_splits x n_repeats) by default. Is that fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from minor changes, in particular default parameters. How do others feel about adding this to check_cv
? We could have a mxn
syntax to have m repetitions of n-fold with automatically detecting stratified vs not, so you could do cv="10x10"
. Not entirely sure about that though.
--------------- | ||
|
||
:class:`RepeatedKFold` repeats K-Fold n times. It can be used when one | ||
requires to run :class:`KFold` n times, producing different splits in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "it can be used to run KFold multiple times to increase the fidelity of the estimate? Or can we say to decrease the variance? Is that accurate?
sklearn/model_selection/_split.py
Outdated
Parameters | ||
---------- | ||
cv : callable | ||
Constructor of cross-validator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the cross validation class itself? That seems more natural than passing the __init__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not passing the __init__
method. It's called as RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits)
. I am changing description to "Cross-validator class.".
sklearn/model_selection/_split.py
Outdated
cv : callable | ||
Constructor of cross-validator. | ||
|
||
n_repeats : int, default=5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably do 10x10 by default, or maybe 5 times 10 fold. This is something you use when you care about accuracy but not necessarily time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing default values of n_splits
to 5 and n_repeats
to 10.
if n_repeats <= 1: | ||
raise ValueError("Number of repetitions must be greater than 1.") | ||
|
||
if any(key in cvargs for key in ('random_state', 'shuffle')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set(cvargs).intersection({'random_state', 'shuffle'})
? Though not really shorter :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the same as both are of equal length. :P
rng = check_random_state(self.random_state) | ||
|
||
for idx in range(n_repeats): | ||
cv = self.cv(random_state=rng, shuffle=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we maybe want to raise nice errors if these arguments are not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get you. Which arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I have no idea what I meant.... ?
LGTM. Can you add an entry to whatsnew.rst? |
@amueller Conflict in |
Fixed it (which you could have also done locally ;) omg this github feature is amazing !! |
It's customary to add your username to the entry, but you don't have to. |
Codecov Report
@@ Coverage Diff @@
## master #8120 +/- ##
==========================================
+ Coverage 95.48% 95.48% +<.01%
==========================================
Files 342 342
Lines 60913 60985 +72
==========================================
+ Hits 58160 58231 +71
- Misses 2753 2754 +1
Continue to review full report at Codecov.
|
@amueller Added my name :) |
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
…8120) * Add _RepeatedSplits and RepeatedKFold class * Add RepeatedStratifiedKFold and doc for repeated cvs * Change default value of n_repeats * Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state * Generate random states in split function rather than store it beforehand * Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes * Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase * Using rng directly as random_state param to create cv instance and added a check for cvargs * Fix pep8 warnings * Changing default values for n_splits and n_repeats and add entry in changelog * Adding name to the feature * Missing space
Reference Issue
Fixes #7948
What does this implement/fix? Explain your changes.
Implements RepeatedKFold and RepeatedStratifiedKFold
Any other comments?
For previous discussion on this, please refer to #7960