-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] rebasing pr/3474 (multioutput regression metrics) #4491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -33,7 +36,7 @@ | |||
] | |||
|
|||
|
|||
def _check_reg_targets(y_true, y_pred): | |||
def _check_reg_targets(y_true, y_pred, output_weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check whether it is acceptable to make output_weights=None
a default? This is making Travis fail on mean_absolute_error
.
It may be better to not set the default and be explicit in mean_absolute_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_weights=None isn't a sane default: it returns a whole array. 'uniform' is better as default. In any case I'll make it explicit to correct median_absolute_error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then don't put any default.
Thanks very much for taking care of this! |
|
||
|
||
def _average_and_variance(values, sample_weight=None): | ||
def _average_and_variance(values, sample_weight=None, axis=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other PR we had discussed whether this function is useful. Looking at it a few months later, I find it weird that this logic was extracted when it is only used twice lower down.
Could we reach a decision on whether to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it looks like over-engineering in this case. I found an issue numpy/numpy#5164 about it, but it seems it was ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more of just writing it out explicitly where needed.
But since it works the way it is right now, in the interest of terminating this PR, I think we should leave it the way it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed on removing it that day, if I recall correctly.
I have removed that debatable function. |
|
||
|
||
def r2_score(y_true, y_pred, | ||
output_weights='uniform', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is regression compared to the previous behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression with respect to the single target behavior or regression with respect to a previous version of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a regression with respect to the current (master) behavior of r2_score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was convinced that the result of the discussions in the last PRs and at the scikit-learn sprint last summer in Paris was that the sane default behavior of agglomerating r2 scores was by averaging the scaled variances. The way it is in current master was add as the keyword output_weights="variance"
for those who understand what its consequences are and who need it (scores being weighted by the variances of the target variables, ie dependent on their scale/unit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find individual
too misleading. collapse_output
? scores_map
? multioutput_averaging
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant output_weights='individual'
as a replacement for output_weights=None
. Also, as Gaël says lower down output_weights -> multioutput_weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I suggest an option multioutput
with values raw_scores
(currently None
behavior), uniform_average
, variance_weighted
. In this case default None
will raise deprecation warning and fallback to variance_weighted
. Is it OK for everybody?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a refactoring. But I can't understand what is the hell with travis. |
Thanks. The Travis failure is indeed weird, since it didn't even start the tests (the one config that runs has all tests passing). I don't know if it is possible to ask it to try again on this commit. But what you can always do is add another commit, correcting some minor issues, such as PEP8 violations. Another helpful thing would be to bring the docstrings into a more standard form, i.e.
should become
|
Well, I think it's done. |
This looks good to me now. I am assuming that even if 1.0 is possibly on the horizon after 0.17, it is still OK to assume 0.18 in the deprecation message. That can always be changed when this becomes clear. @kshmelkov could you squash your commits into a single one called something like |
22d2741
to
d0b8ba1
Compare
is ``'uniform_average'``, which entails a uniformly weighted mean over outputs. If | ||
an ``ndarray`` of shape ``(n_outputs,)`` is passed, then its entries are | ||
interpreted as weights and an according weighted average is returned. If | ||
``multioutput`` is ``'raw_scores'`` is specified, then all unaltered individual scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_scores
doesn't make sense with loss / error based metrics.
Mostly stylish comments, whenever those are addressed +1. |
|
||
These functions have an ``multioutput`` keyword argument which specifies | ||
the way the scores for each individual target should be averaged. The default | ||
is ``'uniform_average'``, which entails a uniformly weighted mean over outputs. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "entails" is the right word.
@kshmelkov could you please squash all your commits in this branch? |
Other than that +1 on my side as well. |
110ed21
to
a745318
Compare
@ogrisel I have applied your corrections and squashed the commits. |
should this get a whatsnew? |
we should close #4491 when merged. |
Sample weights. | ||
|
||
multioutput : string in ['raw_values', 'uniform_average', | ||
'variance_weighted'] or array-like of shape (n_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this render ok in the docs?
Sorry for not being part of the discussion but apart from those 2 minor comments, this looks good to me as well. Thanks @arjoly @kshmelkov and @eickenberg for the help throughout. Let us merge? It's been 2 long years since this PR was started. |
Well if the PR was started with "Closing #4491" but because of the many re-openings of PRs, this one didn't even reference the original issue. |
y_type, y_true, y_pred, multioutput = _check_reg_targets( | ||
y_true, y_pred, multioutput) | ||
|
||
y_diff_avg = np.average(y_true - y_pred, weights=sample_weight, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably does not matter, but can reuse y_true - y_pred
..
@kshmelkov Can you just check the rendering and we can merge? |
I'm on it. |
pushed with whatsnew.rst |
Great !!! Thanks to all who have contributed ! :-) |
I have tried to rebase PR #3474. If it passes Travis, I guess mentionned PR can be replaced.
Basically, it is a support of multioutput support for regression metrics (MAE, MSE, R2, explained variance) in a way that we can get a whole array of scores instead of some kind of averaging. BTW, I corrected a small mistake in docs: it states that median absolute error supports multioutput while it doesn't.