Description
Describe the issue linked to the documentation
The documentation for the **params
parameter to the fit
method of GridSearchCV
leads to confusion. Here is the current text:
Parameters passed to the
fit
method of the estimator, the scorer, and the CV splitter.If a fit parameter is an array-like whose length is equal to
num_samples
then it will be split across CV groups along withX
andy
. For example, the sample_weight parameter is split becauselen(sample_weights) = len(X)
.
I was worried that this meant that grid_search.fit(X, y, groups=g)
would split g
up across the CV partitions, which is definitely not the right behavior. The correct behavior is to pass the groups
parameter unchanged to the CV splitter, e.g. cv.split(X, y, groups=groups)
. I read through the source code and it does appear that the groups
parameter will get passed through unchanged to split
, so it looks like the behavior is correct. But we could use something in the docstring that clarifies this behavior.
Suggest a potential alternative/fix
No response