[MRG+2] #7908 Addressed issue in first iteration of RANSAC regression #7914

mthorrell · 2016-11-20T02:54:58Z

Reference Issue

What does this implement/fix? Explain your changes.

On the first iteration of RANSAC regression, if no inliers are found, an error is produced and the code is stopped. Ideally the procedure would just skip that iteration and continue on to the next iteration where it would use a different random sample which could produce valid inliers.

Generally this error is produced when n_inliers_subset and n_inliers_best are both zero. My fix was to set the initial value for n_inliers_best to 1. Thus if n_inliers_subset >= 1, the code follows the normal path, and if n_inliers_subset == 0, the code progresses to the next iteration. This fixes the bug as described in this issue.

However, setting n_inliers_best = 1 creates an issue again in the first iteration of this loop in the case when n_inliers_subset == 1. The subsequent comparison is made: score_subset < score_best. Since score_best is initialized to np.inf, the code will incorrectly skip to the next iteration, ignoring the fact that in the first iteration, we did find valid inliers. The fix for this is to change the initial value of score_best to -np.inf. In general, I think this is better practice when initializing these types of variables anyway.

Any other comments?

mthorrell · 2016-11-27T22:38:24Z

In this small update, I adjusted the offending test so it did not require a specific output; however, an error is still required to be raised in the residual threshold = 0 case.

amueller · 2016-11-28T23:33:16Z

thanks for the PR. I think changing the tests was the way to go! (I don't have time for a review right now unfortunately)

mthorrell · 2016-11-30T03:18:40Z

Thanks.

I made a couple minor changes:

The test now more specifically checks the error message since I think this is better practice.
I also cleaned up the formatting of that error message.

amueller · 2016-11-30T20:42:16Z

sklearn/linear_model/tests/test_ransac.py

+    msg = (
+        "RANSAC could not find valid consensus set, because"
+        " either the `residual_threshold` rejected all the samples or"
+        " `is_data_valid` and `is_model_valid` returned False for all"


refering to internal functions is not very helpful for the users imho.

I believe you are referring to is_data_valid and is_model_valid. These are user-defined functions much like residual_threshold is a user-defined parameter. When a user does not define these functions, RANSAC does not call them--they are initialized to None and evaluation is skipped. Unless you disagree, these may still be valid to refer to.

Perhaps a better message might read:

RANSAC could not find a valid consensus set. All max_trials iterations were skipped because each randomly chosen sub-sample either: didn't produce a valid set of inliers due to a small residual_threshold or was invalidated by is_data_valid or is_model_valid if either of these functions have been defined by the user.

Oh, sorry, my fault. The current message is ok, though your suggestion is even better!

amueller · 2016-11-30T20:42:47Z

maybe @ahojnnes wants to review?

jnothman

I am not certain this is the right fix. This still fails if a 0-inlier sample is drawn at any point after the first >0-inlier sample is drawn.

We could introduce a parameter to control how many 0-inlier samples are allowed through, or just get rid of this control (the main risk being lots of time wasted).

A couple of more cosmetic things we can do to improve this behaviour:

If we continue to raise an error when no inliers are found, we should report the minimum residual found (and perhaps other summary statistics of the residual distribution) as well as the current threshold, so that the user has a means to tune the parameter.
introduce a verbose option that reports things like the median residual, number of inliers, etc., at each iteration.

Ping @CSchoel for your opinion.

jnothman · 2016-12-07T22:37:53Z

sklearn/linear_model/tests/test_ransac.py

+        " iterations were skipped because each randomly chosen"
+        " sub-sample either: didn't produce a valid set of inliers"
+        " due to a small `residual_threshold` or was invalidated by"
+        " `is_data_valid` or `is_model_valid` if either of these"


I think the "if either ..." can be removed

jnothman · 2016-12-07T22:50:12Z

Sorry, silly me. This patch does indeed fix the issue because the check for n_inliers_subset == 0 only applies when it is >= the best subset (defaulting to 1 in this PR). But that seems to mean that the "No inliers found" error is unreachable; and as a result we may draw all trials with 0 samples and this is expensive.

If this is the right patch, you need to write a test. But we may want a more configurable parameter to control the number of 0-inlier samples. Certainly, we should be giving the user more information on how to set the threshold appropriately.

mthorrell · 2016-12-08T04:23:53Z

The if statement containing n_inliers_subset == 0 was unreachable and hence was removed, so there shouldn't be any unreachable code generated by this PR.

If this is the fix, a test already exists that generates an appropriate error. Unless I'm missing something, there would be nothing else to do here except potentially edit the error message--again assuming we go with this fix.

However, I do think it makes sense to put a limit on the number of iterations that are tried which end in skips. Also, is_data_valid or is_model_valid themselves could be computationally expensive, so I would think we would count skips due to data/model failure the same as skips due to a strict residual_threshold. I suppose one could explicitly track the causes of different skips separately (is_data_valid vs. is_model_valid vs. residual_threshold), but this seems excessive to me.

I'll submit a commit that allows a user to set a max_skips parameter. It will default to max_trials.

jnothman · 2016-12-08T04:26:31Z

(Sorry for my confusions. Was trying to grok the overall algorithm)

jnothman · 2016-12-08T04:27:40Z

If this is the fix, a test already exists that generates an appropriate error.

A non-regression test, which allows 0 inliers to be drawn in the first iteration, would be ideal.

jnothman · 2016-12-08T04:29:15Z

Yes, I don't mind max_skips. I also don't think there's any harm in reporting diagnostics of various kinds, summarising residuals or reason for skipping / not updating.

mthorrell · 2016-12-17T19:26:11Z

err... need to finish the tests, so this is still WIP

jnothman · 2016-12-27T01:50:28Z

Still still WIP?

jnothman

It would be useful to set these diagnostics as attributes of the model, i.e. n_skips_no_inliers_, etc.

We may also then choose to remove them from the error message (and certainly from the warning, to help the warnings module ignore duplicates), pointing the user to these attributes instead.

jnothman · 2016-12-27T02:33:01Z

sklearn/linear_model/ransac.py

@@ -111,6 +111,11 @@ class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin):
    max_trials : int, optional
        Maximum number of iterations for random sample selection.

+    max_skips : int, optional
+        Maximum number of iterations that can be skipped due to finding zero
+        inliers or invalid data defined by `is_data_valid` or invalid models


need double-backticks

jnothman

I'd still consider including a diagnostic about the stringency of the threshold, either reporting some statistic of the residuals (e.g. average min) or some statistic of the inlier subset size.

mthorrell · 2016-12-28T15:34:44Z

I like the attributes idea. The error and warning messages seemed a bit clunky. Let me see if I can do something about the stringency of the threshold as well.

mthorrell · 2016-12-28T18:20:00Z

I added the n_skips_no_inliers etc. attributes and adjusted the error messages and tests.

I did not add anything indicating the stringency of the threshold. Should that be different PR? This one is getting rather large imo.

jnothman · 2016-12-29T02:03:55Z

Okay. Large? Not so much. Unnecessarily long in the tooth? I agree. Let's leave scope as it is currently.

…

On 29 December 2016 at 05:20, mthorrell ***@***.***> wrote: I added the n_skips_no_inliers etc. attributes and adjusted the error messages and tests. I did not add anything indicating the stringency of the threshold. Should that be different PR? This one is getting rather large imo. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7914 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6w1udNqqptWBXiWpFDhR-CxM156fks5rMqhSgaJpZM4K3b8L> .

jnothman

Otherwise LGTM!

jnothman · 2016-12-29T03:46:54Z

sklearn/linear_model/ransac.py

+    max_skips : int, optional
+        Maximum number of iterations that can be skipped due to finding zero
+        inliers or invalid data defined by ``is_data_valid`` or invalid models
+        defined by ``is_data_valid``.


should be is_model_valid

jnothman · 2016-12-29T03:48:48Z

sklearn/linear_model/ransac.py

+                    "RANSAC skipped more iterations than `max_skips` without"
+                    " finding a valid consensus set. Iterations were skipped"
+                    " because each randomly chosen sub-sample failed the"
+                    " passing criteria. The object attributes"


object -> estimator. But perhaps "See estimator attributes for diagnostics (n_skips*)." is sufficient

agramfort · 2016-12-31T12:48:37Z

+1 for merge

@jnothman merge if you're happy

jnothman · 2016-12-31T13:16:20Z

sklearn/linear_model/ransac.py

+        Maximum number of iterations that can be skipped due to finding zero
+        inliers or invalid data defined by ``is_data_valid`` or invalid models
+        defined by ``is_model_valid``.
+


I suppose nowadays we're meant to have .. versionadded annotations in docstrings.

Is it accurate to say .. versionadded:: 0.19 or should it still be 0.18?

jnothman · 2017-01-02T21:02:32Z

0.19.

…

On 3 January 2017 at 05:14, mthorrell ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/linear_model/ransac.py <#7914>: > @@ -111,6 +111,11 @@ class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin): max_trials : int, optional Maximum number of iterations for random sample selection. + max_skips : int, optional + Maximum number of iterations that can be skipped due to finding zero + inliers or invalid data defined by ``is_data_valid`` or invalid models + defined by ``is_model_valid``. + Is it accurate to say .. versionadded:: 0.19 or should it still be 0.18? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7914>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69Y4ouiibM8QLXSVzox9SqGhbbY-ks5rOT6jgaJpZM4K3b8L> .

mthorrell · 2017-01-04T00:38:41Z

made the versionadded change. I believe this is finished, assuming the tests pass.

jnothman

Sorry, I'd forgotten to ask you to add an entry to whats_new.rst (under enhancements I think? bug fixes if you rather?)

mthorrell · 2017-01-05T03:20:53Z

I made the entry in whats_new.rst. Is there anything else needed?

jnothman · 2017-01-05T03:30:04Z

All good, thanks!

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

Michael Horrell added 3 commits November 19, 2016 19:45

Addressed issue 7908 in RANSAC regression

fd49709

changed np.inf to -np.inf for first iteration

e57a1cd

Changed assert_raises_regexp to assert_raises

640339f

amueller added this to the 0.19 milestone Nov 28, 2016

amueller added the Bug label Nov 28, 2016

included error message in test. cleaned up error message format

160d8d1

added whitespaces after comma

97862b6

amueller reviewed Nov 30, 2016

View reviewed changes

changed stop message

fd0bff6

jnothman reviewed Dec 7, 2016

View reviewed changes

Michael Horrell added 2 commits December 17, 2016 13:17

tracked skipped iterations added errors and warnings

a3b1671

fixed documentation

9b5971e

Michael Horrell added 2 commits December 21, 2016 22:49

added tests and some further cleanup

5206add

removed unnecessary import

83d05de

jnothman reviewed Dec 27, 2016

View reviewed changes

Michael Horrell added 2 commits December 28, 2016 11:23

added skip count attributes and adjusted tests

ba63da3

made changes to other error messages

a3f17e1

jnothman approved these changes Dec 29, 2016

View reviewed changes

jnothman changed the title ~~[MRG] #7908 Addressed issue in first iteration of RANSAC regression~~ [MRG+1] #7908 Addressed issue in first iteration of RANSAC regression Dec 29, 2016

fixed attribute desc. and changed error/warn messages

97c4cb2

agramfort changed the title ~~[MRG+1] #7908 Addressed issue in first iteration of RANSAC regression~~ [MRG+2] #7908 Addressed issue in first iteration of RANSAC regression Dec 31, 2016

jnothman reviewed Dec 31, 2016

View reviewed changes

added versionadded

1d47d47

jnothman reviewed Jan 4, 2017

View reviewed changes

changed the whats new doc

739ff6e

Merge branch 'master' into ransac_bugfix

be8d649

jnothman merged commit d0ce0d9 into scikit-learn:master Jan 5, 2017

raghavrv pushed a commit to raghavrv/scikit-learn that referenced this pull request Jan 5, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

7e823d3

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

61560fd

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

Closed

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

aff2323

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

b334603

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

1af7504

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+2] Avoid failure in first iteration of RANSAC regression (scikit…

47fd54e

…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics

Uh oh!

[MRG+2] #7908 Addressed issue in first iteration of RANSAC regression #7914

[MRG+2] #7908 Addressed issue in first iteration of RANSAC regression #7914

Uh oh!

Conversation

mthorrell commented Nov 20, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

mthorrell commented Nov 27, 2016

Uh oh!

amueller commented Nov 28, 2016

Uh oh!

mthorrell commented Nov 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 30, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Dec 7, 2016

Uh oh!

mthorrell commented Dec 8, 2016

Uh oh!

jnothman commented Dec 8, 2016

Uh oh!

jnothman commented Dec 8, 2016

Uh oh!

jnothman commented Dec 8, 2016

Uh oh!

mthorrell commented Dec 17, 2016

Uh oh!

jnothman commented Dec 27, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

mthorrell commented Dec 28, 2016

Uh oh!

mthorrell commented Dec 28, 2016

Uh oh!

jnothman commented Dec 29, 2016 via email

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Dec 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jan 2, 2017 via email

Uh oh!

mthorrell commented Jan 4, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

mthorrell commented Jan 5, 2017

Uh oh!

jnothman commented Jan 5, 2017