Skip to content

[MRG+1] Multi-output scoring, or 2493 cont'd for real #3474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

eickenberg
Copy link
Contributor

This time taking into account the refactoring of metrics.py into several files.

This replaces #3456, which I closed.

@@ -1006,6 +1023,13 @@ and :math:`y_i` is the corresponding true value, then the mean absolute error

\text{MAE}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| y_i - \hat{y}_i \right|.

The :func:`mean_absolute_error` function has an `output_weights` keyword
with two possible values `None` and 'uniform'. If the value provided is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To render properly in the doc, the code should between double backtick thus None .

@ogrisel
Copy link
Member

ogrisel commented Jul 23, 2014

Travis reports 2 failures:

======================================================================
FAIL: Doctest: sklearn.metrics.regression.r2_score
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/doctest.py", line 2201, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for sklearn.metrics.regression.r2_score
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/regression.py", line 369, in r2_score

----------------------------------------------------------------------
File "/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/regression.py", line 426, in sklearn.metrics.regression.r2_score
Failed example:
    r2_score(y_true, y_pred)  # doctest: +ELLIPSIS
Expected:
    0.938...
Got:
    0.93680052666227787

>>  raise self.failureException(self.format_failure(<StringIO.StringIO instance at 0x60c9b00>.getvalue()))


======================================================================
FAIL: sklearn.metrics.tests.test_common.test_format_invariance_with_1d_vectors
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/tests/test_common.py", line 484, in test_format_invariance_with_1d_vectors
    assert_raises(ValueError, metric, y1_row, y2_row)
AssertionError: ValueError not raised

@eickenberg
Copy link
Contributor Author

I fixed the ValueError not raised fail by adding explained_variance_score to the list of multioutput scores in test_common.py. The other error was due to the fact that the default averaging of scores is now 'uniform'.

variance of each output is weighted by the scale of the corresponding target
variable.
One can also specify arbitrary weights: If an ``ndarray`` is given,
then a weighted average is formed accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text is almost identical for explained_variance_score, r2_score, mean_absolute_error and mean_squared_error (missing in narrative doc). What do you think of creating a separate section to explain the output_weight (and maybe sample_weight) argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having troubles building the docs. Hope to be able to make that work quickly so I can actually see my edits.

@arjoly
Copy link
Member

arjoly commented Jul 30, 2014

Can you run flake8 on the code? Thanks!

output_weights = np.ones_like(output_scores)
elif output_weights == 'variance':
output_weights = denominator
elif output_weights == None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_weights is None

# y_true is not interesting for scoring a regression anyway
output_scores[nonzero_numerator & ~nonzero_denominator] = 0.
if output_weights == 'uniform':
output_weights = np.ones_like(output_scores)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set this to None to have the same effect.

@arjoly
Copy link
Member

arjoly commented Aug 1, 2014

I am having troubles building the docs. Hope to be able to make that work quickly so I can actually see my edits.

I am able to build the doc with

Sphinx==1.2b1
Pillow==2.0.0
Pygments==1.6
PIL==1.1.7
matplotlib==1.2.1
Jinja2==2.7
MarkupSafe==0.18
docutils==0.10

I hope it helps.

variance = np.average((values - average)**2, weights=sample_weight)
sample_weight = sample_weight.reshape((n_samples, 1))
# if multi output but sample weight only specified in one column,
# then we need to broadcast it over outputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the broadcast by first averaging over samples and then average over outputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do this if axis=None. Otherwise I need to keep the targets separate. Or bypass this function altogether ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my fear is that it becomes slow. But further optimization could be done in another pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that this can become slow when (n_test_samples, n_outputs) gets very large. I have around (1000, 100000) and it is still OK. Making this better will surely imply either getting rid of np.average (in the multioutput case), because it imposes this broadcast to be done explicitly or to writing an upstream suggestion to handle broadcasting implicitly, which will probably not be accepted on grounds of avoiding magical behaviour. If this is a serious problem I can definitely fix it right away.

@arjoly
Copy link
Member

arjoly commented Aug 4, 2014

LGTM travis failure is unrelated.

@arjoly arjoly changed the title [MRG] Multi-output scoring, or 2493 cont'd for real [MRG+1] Multi-output scoring, or 2493 cont'd for real Aug 4, 2014
@MechCoder
Copy link
Member

@eickenberg Thanks a lot for bringing this back to life :)

@eickenberg
Copy link
Contributor Author

Definitely breathing again -- Just squashed it down to one commit, hope it survived ;)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling e40235e on eickenberg:2493b2 into 83223fd on scikit-learn:master.

@arjoly
Copy link
Member

arjoly commented Aug 7, 2014

Travis is happy now :-)

@MechCoder
Copy link
Member

@eickenberg hope this doesn't die till someone has to bring it back to life again :)

@arjoly
Copy link
Member

arjoly commented Aug 12, 2014

@MechCoder Have you had time to review the code?

@MechCoder
Copy link
Member

@arjoly I definitely can, but I doubt I will have anything more other than over your inputs :)

@arjoly
Copy link
Member

arjoly commented Sep 24, 2014

Could we consider this as a +1? :-)

@MechCoder
Copy link
Member

@arjoly I'm slightly busy till Thursday next week (grad school exams). I will definitely have a final look on Thursday or just after that.

@@ -114,13 +152,32 @@ def mean_absolute_error(y_true, y_pred, sample_weight=None):
y_pred : array-like of shape = [n_samples] or [n_samples, n_outputs]
Estimated target values.

output_weights : string in ['uniform'] or None
or array-like of shape [n_outputs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be array-like, shape(n_outputs,)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see it is like this everywhere. so this can be left alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep to the local choice of syntax to stay consistent. That said, I don't like it and could change it, but then I would change it all. Putting a low priority on that, but if we conclude it is necessary, I'll do it.

@MechCoder
Copy link
Member

@eickenberg Thanks for finishing up my work. If you agree to my changes, and are busy with some other work, let me know I can send a PR across your branch. I would like to see this merged.

Also there are merge conflicts.

@eickenberg
Copy link
Contributor Author

I will update the code somewhat this afternoon and then try to rebase. There may be need for one extra round of discussion. But it looks like we are almost there.

@amueller
Copy link
Member

Closing in favor of #4491.

@amueller amueller closed this Apr 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants