You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that when using MLPRegressor with shuffle=False, the train_test_split function still shuffles the data when creating the validation set. According to the documentation, shuffle=False ensures that there is no shuffling of the data between iterations, but it seems this doesn't apply when splitting the validation set.
While the documentation doesn’t explicitly mention that shuffle=False affects validation splitting, I believe that users who disable shuffling are often working with data where the order is important (e.g., time series). In these cases, shuffling the validation set may not reflect how the model will perform on unseen, sequential data. Without shuffling, the validation set would provide a more accurate representation of how the model behaves in real-world scenarios with ordered data.
Could you clarify if this behavior is intended, or if it could be adjusted to respect the shuffle=False setting when creating the validation set?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've noticed that when using MLPRegressor with shuffle=False, the train_test_split function still shuffles the data when creating the validation set. According to the documentation, shuffle=False ensures that there is no shuffling of the data between iterations, but it seems this doesn't apply when splitting the validation set.
While the documentation doesn’t explicitly mention that shuffle=False affects validation splitting, I believe that users who disable shuffling are often working with data where the order is important (e.g., time series). In these cases, shuffling the validation set may not reflect how the model will perform on unseen, sequential data. Without shuffling, the validation set would provide a more accurate representation of how the model behaves in real-world scenarios with ordered data.
Could you clarify if this behavior is intended, or if it could be adjusted to respect the shuffle=False setting when creating the validation set?
relevant code:
scikit-learn/sklearn/neural_network/_multilayer_perceptron.py
Line 586 in 88c2db2
scikit-learn/sklearn/model_selection/_split.py
Line 2731 in 88c2db2
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions