-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG+2] #7908 Addressed issue in first iteration of RANSAC regression #7914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In this small update, I adjusted the offending test so it did not require a specific output; however, an error is still required to be raised in the residual threshold = 0 case. |
thanks for the PR. I think changing the tests was the way to go! (I don't have time for a review right now unfortunately) |
Thanks. I made a couple minor changes:
|
msg = ( | ||
"RANSAC could not find valid consensus set, because" | ||
" either the `residual_threshold` rejected all the samples or" | ||
" `is_data_valid` and `is_model_valid` returned False for all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refering to internal functions is not very helpful for the users imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are referring to is_data_valid
and is_model_valid
. These are user-defined functions much like residual_threshold
is a user-defined parameter. When a user does not define these functions, RANSAC does not call them--they are initialized to None
and evaluation is skipped. Unless you disagree, these may still be valid to refer to.
Perhaps a better message might read:
RANSAC could not find a valid consensus set. All max_trials
iterations were skipped because each randomly chosen sub-sample either: didn't produce a valid set of inliers due to a small residual_threshold
or was invalidated by is_data_valid
or is_model_valid
if either of these functions have been defined by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, my fault. The current message is ok, though your suggestion is even better!
maybe @ahojnnes wants to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain this is the right fix. This still fails if a 0-inlier sample is drawn at any point after the first >0-inlier sample is drawn.
We could introduce a parameter to control how many 0-inlier samples are allowed through, or just get rid of this control (the main risk being lots of time wasted).
A couple of more cosmetic things we can do to improve this behaviour:
- If we continue to raise an error when no inliers are found, we should report the minimum residual found (and perhaps other summary statistics of the residual distribution) as well as the current threshold, so that the user has a means to tune the parameter.
- introduce a verbose option that reports things like the median residual, number of inliers, etc., at each iteration.
Ping @CSchoel for your opinion.
" iterations were skipped because each randomly chosen" | ||
" sub-sample either: didn't produce a valid set of inliers" | ||
" due to a small `residual_threshold` or was invalidated by" | ||
" `is_data_valid` or `is_model_valid` if either of these" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "if either ..." can be removed
Sorry, silly me. This patch does indeed fix the issue because the check for If this is the right patch, you need to write a test. But we may want a more configurable parameter to control the number of 0-inlier samples. Certainly, we should be giving the user more information on how to set the threshold appropriately. |
The If this is the fix, a test already exists that generates an appropriate error. Unless I'm missing something, there would be nothing else to do here except potentially edit the error message--again assuming we go with this fix. However, I do think it makes sense to put a limit on the number of iterations that are tried which end in skips. Also, I'll submit a commit that allows a user to set a |
(Sorry for my confusions. Was trying to grok the overall algorithm) |
A non-regression test, which allows 0 inliers to be drawn in the first iteration, would be ideal. |
Yes, I don't mind |
err... need to finish the tests, so this is still WIP |
Still still WIP? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to set these diagnostics as attributes of the model, i.e. n_skips_no_inliers_
, etc.
We may also then choose to remove them from the error message (and certainly from the warning, to help the warnings module ignore duplicates), pointing the user to these attributes instead.
@@ -111,6 +111,11 @@ class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin): | |||
max_trials : int, optional | |||
Maximum number of iterations for random sample selection. | |||
|
|||
max_skips : int, optional | |||
Maximum number of iterations that can be skipped due to finding zero | |||
inliers or invalid data defined by `is_data_valid` or invalid models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need double-backticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still consider including a diagnostic about the stringency of the threshold, either reporting some statistic of the residuals (e.g. average min) or some statistic of the inlier subset size.
I like the attributes idea. The error and warning messages seemed a bit clunky. Let me see if I can do something about the stringency of the threshold as well. |
I added the I did not add anything indicating the stringency of the threshold. Should that be different PR? This one is getting rather large imo. |
Okay. Large? Not so much. Unnecessarily long in the tooth? I agree. Let's
leave scope as it is currently.
…On 29 December 2016 at 05:20, mthorrell ***@***.***> wrote:
I added the n_skips_no_inliers etc. attributes and adjusted the error
messages and tests.
I did not add anything indicating the stringency of the threshold. Should
that be different PR? This one is getting rather large imo.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7914 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6w1udNqqptWBXiWpFDhR-CxM156fks5rMqhSgaJpZM4K3b8L>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM!
max_skips : int, optional | ||
Maximum number of iterations that can be skipped due to finding zero | ||
inliers or invalid data defined by ``is_data_valid`` or invalid models | ||
defined by ``is_data_valid``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be is_model_valid
"RANSAC skipped more iterations than `max_skips` without" | ||
" finding a valid consensus set. Iterations were skipped" | ||
" because each randomly chosen sub-sample failed the" | ||
" passing criteria. The object attributes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object -> estimator. But perhaps "See estimator attributes for diagnostics (n_skips*)." is sufficient
+1 for merge @jnothman merge if you're happy |
Maximum number of iterations that can be skipped due to finding zero | ||
inliers or invalid data defined by ``is_data_valid`` or invalid models | ||
defined by ``is_model_valid``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose nowadays we're meant to have .. versionadded
annotations in docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it accurate to say .. versionadded:: 0.19
or should it still be 0.18
?
0.19.
…On 3 January 2017 at 05:14, mthorrell ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/linear_model/ransac.py
<#7914>:
> @@ -111,6 +111,11 @@ class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin):
max_trials : int, optional
Maximum number of iterations for random sample selection.
+ max_skips : int, optional
+ Maximum number of iterations that can be skipped due to finding zero
+ inliers or invalid data defined by ``is_data_valid`` or invalid models
+ defined by ``is_model_valid``.
+
Is it accurate to say .. versionadded:: 0.19 or should it still be 0.18?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7914>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz69Y4ouiibM8QLXSVzox9SqGhbbY-ks5rOT6jgaJpZM4K3b8L>
.
|
made the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'd forgotten to ask you to add an entry to whats_new.rst
(under enhancements I think? bug fixes if you rather?)
I made the entry in |
All good, thanks! |
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
…-learn#7914) Fixes scikit-learn#7908 Adds RANSACRegressor attributes n_skips_* for diagnostics
Reference Issue
Fixes #7908
What does this implement/fix? Explain your changes.
On the first iteration of RANSAC regression, if no inliers are found, an error is produced and the code is stopped. Ideally the procedure would just skip that iteration and continue on to the next iteration where it would use a different random sample which could produce valid inliers.
Generally this error is produced when
n_inliers_subset
andn_inliers_best
are both zero. My fix was to set the initial value forn_inliers_best
to 1. Thus ifn_inliers_subset >= 1
, the code follows the normal path, and ifn_inliers_subset == 0
, the code progresses to the next iteration. This fixes the bug as described in this issue.However, setting
n_inliers_best = 1
creates an issue again in the first iteration of this loop in the case whenn_inliers_subset == 1
. The subsequent comparison is made:score_subset < score_best
. Sincescore_best
is initialized tonp.inf
, the code will incorrectly skip to the next iteration, ignoring the fact that in the first iteration, we did find valid inliers. The fix for this is to change the initial value ofscore_best
to-np.inf
. In general, I think this is better practice when initializing these types of variables anyway.Any other comments?