Skip to content

[MRG+2] rebasing pr/3474 (multioutput regression metrics) #4491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kshmelkov
Copy link
Contributor

I have tried to rebase PR #3474. If it passes Travis, I guess mentionned PR can be replaced.

Basically, it is a support of multioutput support for regression metrics (MAE, MSE, R2, explained variance) in a way that we can get a whole array of scores instead of some kind of averaging. BTW, I corrected a small mistake in docs: it states that median absolute error supports multioutput while it doesn't.

@@ -33,7 +36,7 @@
]


def _check_reg_targets(y_true, y_pred):
def _check_reg_targets(y_true, y_pred, output_weights):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check whether it is acceptable to make output_weights=None a default? This is making Travis fail on mean_absolute_error.

It may be better to not set the default and be explicit in mean_absolute_error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_weights=None isn't a sane default: it returns a whole array. 'uniform' is better as default. In any case I'll make it explicit to correct median_absolute_error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then don't put any default.

@eickenberg
Copy link
Contributor

Thanks very much for taking care of this!



def _average_and_variance(values, sample_weight=None):
def _average_and_variance(values, sample_weight=None, axis=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other PR we had discussed whether this function is useful. Looking at it a few months later, I find it weird that this logic was extracted when it is only used twice lower down.

Could we reach a decision on whether to remove it?

@MechCoder @arjoly @ogrisel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it looks like over-engineering in this case. I found an issue numpy/numpy#5164 about it, but it seems it was ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more of just writing it out explicitly where needed.

But since it works the way it is right now, in the interest of terminating this PR, I think we should leave it the way it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed on removing it that day, if I recall correctly.

@kshmelkov kshmelkov changed the title rebasing pr/3474 (multioutput regression metrics) [MRG] rebasing pr/3474 (multioutput regression metrics) Apr 2, 2015
@kshmelkov
Copy link
Contributor Author

I have removed that debatable function.



def r2_score(y_true, y_pred,
output_weights='uniform',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is regression compared to the previous behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression with respect to the single target behavior or regression with respect to a previous version of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a regression with respect to the current (master) behavior of r2_score.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was convinced that the result of the discussions in the last PRs and at the scikit-learn sprint last summer in Paris was that the sane default behavior of agglomerating r2 scores was by averaging the scaled variances. The way it is in current master was add as the keyword output_weights="variance" for those who understand what its consequences are and who need it (scores being weighted by the variances of the target variables, ie dependent on their scale/unit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find individual too misleading. collapse_output? scores_map? multioutput_averaging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant output_weights='individual' as a replacement for output_weights=None. Also, as Gaël says lower down output_weights -> multioutput_weight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest an option multioutput with values raw_scores (currently None behavior), uniform_average, variance_weighted. In this case default None will raise deprecation warning and fallback to variance_weighted. Is it OK for everybody?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshmelkov
Copy link
Contributor Author

I have done a refactoring. But I can't understand what is the hell with travis.

@eickenberg
Copy link
Contributor

Thanks. The Travis failure is indeed weird, since it didn't even start the tests (the one config that runs has all tests passing). I don't know if it is possible to ask it to try again on this commit. But what you can always do is add another commit, correcting some minor issues, such as PEP8 violations.

Another helpful thing would be to bring the docstrings into a more standard form, i.e.

array-like of shape = [n_samples] or [n_samples, n_outputs]

should become

array-like of shape (n_samples,) or (n_samples, n_outputs)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 95.16% when pulling 22d2741 on kshmelkov:rebase_pr_3474 into 6e54079 on scikit-learn:master.

@kshmelkov
Copy link
Contributor Author

Well, I think it's done.

@eickenberg
Copy link
Contributor

This looks good to me now.

I am assuming that even if 1.0 is possibly on the horizon after 0.17, it is still OK to assume 0.18 in the deprecation message. That can always be changed when this becomes clear.

@kshmelkov could you squash your commits into a single one called something like ENH multioutput regression metrics ?

@amueller
Copy link
Member

amueller commented Apr 6, 2015

@arjoly gave +1 on the other one, right? @arjoly can you maybe have a look again to confirm?

is ``'uniform_average'``, which entails a uniformly weighted mean over outputs. If
an ``ndarray`` of shape ``(n_outputs,)`` is passed, then its entries are
interpreted as weights and an according weighted average is returned. If
``multioutput`` is ``'raw_scores'`` is specified, then all unaltered individual scores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_scores doesn't make sense with loss / error based metrics.

@arjoly
Copy link
Member

arjoly commented Apr 11, 2015

Mostly stylish comments, whenever those are addressed +1.


These functions have an ``multioutput`` keyword argument which specifies
the way the scores for each individual target should be averaged. The default
is ``'uniform_average'``, which entails a uniformly weighted mean over outputs. If
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "entails" is the right word.

@ogrisel
Copy link
Member

ogrisel commented Apr 23, 2015

@kshmelkov could you please squash all your commits in this branch?

@ogrisel
Copy link
Member

ogrisel commented Apr 23, 2015

Other than that +1 on my side as well.

@ogrisel ogrisel changed the title [MRG] rebasing pr/3474 (multioutput regression metrics) [MRG+1] rebasing pr/3474 (multioutput regression metrics) Apr 23, 2015
@kshmelkov
Copy link
Contributor Author

@ogrisel I have applied your corrections and squashed the commits.

@amueller
Copy link
Member

should this get a whatsnew?

@amueller amueller changed the title [MRG+1] rebasing pr/3474 (multioutput regression metrics) [MRG+2] rebasing pr/3474 (multioutput regression metrics) Apr 29, 2015
@amueller
Copy link
Member

we should close #4491 when merged.

Sample weights.

multioutput : string in ['raw_values', 'uniform_average',
'variance_weighted'] or array-like of shape (n_outputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this render ok in the docs?

@MechCoder
Copy link
Member

Sorry for not being part of the discussion but apart from those 2 minor comments, this looks good to me as well. Thanks @arjoly @kshmelkov and @eickenberg for the help throughout. Let us merge? It's been 2 long years since this PR was started.

@MechCoder
Copy link
Member

we should close #4491 when merged.

@amueller I thought github does that automatically? :P

@amueller
Copy link
Member

amueller commented May 8, 2015

Well if the PR was started with "Closing #4491" but because of the many re-openings of PRs, this one didn't even reference the original issue.
Do you want to check if the docs render, add a whatsnew and push?

y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput)

y_diff_avg = np.average(y_true - y_pred, weights=sample_weight, axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably does not matter, but can reuse y_true - y_pred..

@MechCoder
Copy link
Member

@kshmelkov Can you just check the rendering and we can merge?

@amueller
Copy link
Member

amueller commented May 8, 2015

I'm on it.

@amueller
Copy link
Member

amueller commented May 8, 2015

pushed with whatsnew.rst

@arjoly
Copy link
Member

arjoly commented May 9, 2015

Great !!! Thanks to all who have contributed ! :-)

@kshmelkov kshmelkov deleted the rebase_pr_3474 branch May 15, 2015 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants