-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MLPRegressor - Validation score wrongly defined #24411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @ducvinh9 , thank you for reporting the issue. The coefficient of determination is calculated as Therefore a larger coefficient of determination is equivalent to smaller mean squared error. |
Hi @MaxwellLZH, Thank for your reply. It is true that a larger coefficient of determination is equivalent to smaller mean squared error, but I think we should not use them interchangeably for the the following reasons:
Juste my thinking. MLPRegressor is an awesome tool which performs faster on simple NN models than Tensorflow on CPU. I heard someone says people stop supporting additional features for the MLP module, which is sad. |
I agree that usually, we should monitor the loss instead of the final metric. This is indeed the default in other estimators such as gradient boosting. We could think of adding a |
Note that in scikit-learn we have the naming convention that "score" always mean "higher is better" and "loss" mean that "lower is better" when we speak about performance metrics. I think our early stopping API is rather poor and not consistent across estimators. One way to make them consistent without introducing too much code duplication would be to go through the future callback API currently being designed by @jeremiedbb in #22000. |
Note that in This is not very intuitive either (but it's consistent with the scikit-learn loss/score naming convention). We need to rethink this. The callback API should allow us to both use one specific loss or scoring to decide when to stop but also allow us to compute many metric values both on the training and validation data at the end of each iteration in |
Describe the bug
In MLPRegressor, if the option early_stopping is set as True, the model will monitor the loss calculated on the validation set in stead of the training set, using the same loss formulation which is the mean squared error. However, as implemented in the line 719 in the source code:
The function "score", which returns (to confirm) the coefficient of determination, is used. This is not correct. It should be something like:
Steps/Code to Reproduce
Sorry, I don't have time to write a simple code. But the error is quite clear.
Expected Results
The validation score must be mean squared error.
Actual Results
Coefficient of determination
Versions
The text was updated successfully, but these errors were encountered: