-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add the RMSE(Root Mean Squared Error) option to the cross_val_score. #6457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
The :func:`root_mean_squared_error` function computes `root mean square | ||
error <https://en.wikipedia.org/wiki/Root-mean-square_deviation>`_, a risk | ||
metric corresponding to the expected value of the root mean squared error loss or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*loss or loss
I'm happy to see this separate scorer, but I don't think this needs to be a separate function to MSE, merely a parameter. Not sure if others will agree though... |
I'm happy to see this separate scorer, but I don't think this needs to
be a separate function to MSE, merely a parameter. Not sure if others
will agree though...
No strong feeling on either side.
|
I guess I was thinking of the |
As discussed at the last sprint (I can't find notes, bad on us): we want to deprecate MSE, and only use 'negated_mse' to always have consistently bigger is better: https://sourceforge.net/p/scikit-learn/mailman/message/31632671/ So we should have a negated_rmse |
@jnothman Thank you for the suggestion. Some people might think these as over-engineering. However, I am concerned that the parameter approach (such as the normalize option) might cause confusion to other people who want to use this method. I'm a fairly open-minded person and I will follow a better suggestion if there is one. Honestly speaking, I think this is the best implementation so far. Anyway, I will always welcome good opinions. @GaelVaroquaux I remember reading about that discussion. I would suggest the scikit-learn teams to obtain a separate pull request about negated_mse and negated_rmse. If I implement negated_rmse in this pull request, we fall into a dangerous situation where we implement only negated_rmse but not negated_mse. I'll send another pull request about both negated_mse and negated_rmse if time allows. PS) Sorry for the test failure. :( I'll fix the problem as soon as possible. |
If I implement negated_rmse in this pull request, we fall into a
dangerous situation where we implement only negated_rmse but not
negated_mse. I'll send another pull request about both negated_mse and
negated_rmse if time allows.
I find that implementing rmse is actually the dangerous issue: we would
want to deprecate it as soon as it is implemented. Deprecations are
things that are very costly, to us and to our users.
|
@GaelVaroquaux I understand. You have a point. I’ve decided to implement negated_rmse as you mentioned. But I am still not sure about the amendment that came out during this discussion. Is the following right?
|
Any updates on this? |
I read the following discussion about the negated_mse issue at #2439. Consequently, I decided it was enough to just change the scoring option of cross_val_score, similar to the suggestion by @GaelVaroquaux. I welcome any opinions. :D |
hmm, this seems reasonable (from an api design viewpoint). I'll take a look at the code in a few days when I get a chance. |
@nelson-liu Sure you can. :D |
the implementation lgtm, @GaelVaroquaux do you have any thoughts on the api design? |
can you please rebase? and it should be just |
@amueller Sure. :D Wait a moment. |
…or' to 'negated_mse'
I just finished to rebase this pull request. Are there anythings I can help you with? |
I still would rather not see |
Thanks for your opinion. :D In my understanding, I'm not sure whether we should treat this in the same way as euclidean_distances or not. Usually we can treat both However, I think the I'll update the documentation as soon as possible, and I'll especially emphasise when this method would be helpful, and why this approach is sometimes better than MSE. |
|
||
""" | ||
error = mean_squared_error(y_true, y_pred, sample_weight, multioutput) | ||
return error ** 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, is this as good as np.sqrt? It might take a different code path.
@shaynekang this is a global monotonic transformation, so it doesn't change anything. If there was a sqrt on the inside that would be different. But this has exactly the same properties as mse. |
I just did timings and, to my surprise, both take the same amount of |
Indeed, in terms of model selection, RMSE and MSE are exactly the same thing. The question is how far down we want to lower the bar for people who don't understand these things, or don't know how to compute a sqrt in Python. My hunch is usually that lowering the bar too much ends up onboarding people that we shouldn't onboard, because they will be only cost and no benefit. |
though explicit is better than implicit ;) |
So I get the impression that core dev consensus is something like -1 for a On 15 October 2016 at 06:17, Andreas Mueller notifications@github.com
|
@jnothman yeah I think so. |
Yes, thanks @cmarmo! Thanks for to all contributors in this PR. Closing as resolved. |
Add the RMSE(Root Mean Squared Error) option to the cross_val_score.
Many Kaggle competitions are selecting RMSE as their official evaluation score. (Home Depot Product Search Relevance, Restaurant Revenue Prediction, Facial Keypoints Detection, etc)
Usually, Kagglers directly implement this use of mean_squared_error. However, I think this is a waste of time, so I decided to implement RMSE and thereby add it to the cross_val_score function.
Hope this helps Kagglers :D