Potential Issue with shuffle Parameter in MLPRegressor Validation Split #29860

sroee · 2024-09-16T12:48:02Z

sroee
Sep 16, 2024

Hi,

I've noticed that when using MLPRegressor with shuffle=False, the train_test_split function still shuffles the data when creating the validation set. According to the documentation, shuffle=False ensures that there is no shuffling of the data between iterations, but it seems this doesn't apply when splitting the validation set.

While the documentation doesn’t explicitly mention that shuffle=False affects validation splitting, I believe that users who disable shuffling are often working with data where the order is important (e.g., time series). In these cases, shuffling the validation set may not reflect how the model will perform on unseen, sequential data. Without shuffling, the validation set would provide a more accurate representation of how the model behaves in real-world scenarios with ordered data.

Could you clarify if this behavior is intended, or if it could be adjusted to respect the shuffle=False setting when creating the validation set?

relevant code:

scikit-learn/sklearn/neural_network/_multilayer_perceptron.py

Line 586 in 88c2db2

X, X_val, y, y_val = train_test_split(

scikit-learn/sklearn/model_selection/_split.py

Line 2731 in 88c2db2

shuffle=True,

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Potential Issue with shuffle Parameter in MLPRegressor Validation Split #29860

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Potential Issue with shuffle Parameter in MLPRegressor Validation Split #29860

Uh oh!

Uh oh!

sroee Sep 16, 2024

Replies: 0 comments

sroee
Sep 16, 2024