-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX out of bound error in split_indices #21130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX out of bound error in split_indices #21130
Conversation
@ogrisel @thomasjpfan @NicolasHug @glemaitre @adrinjalali ping |
I agree this probably is a bug, but I'm not sure if we need to panic about it. We do have a large set of bugs already 😁 |
# if n_threads >= 2 one has right_counts[-1] = 0 and | ||
# right_offset[-1] = len(sample_indices) leading to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm likely missing something but I don't understand these 2 lines
in the run-down example above, we have n_threads >= 2 and yet right_counts[-1] != 0 and right_offset[-1] != len(sample_indices)
Also it's not clear to me why this issue only ever happens for the right node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you write that code? 😏
The "last" thread is responsible for the most "right" samples. The "right" part of the "last" thread sorts the last indices. In can happen that right_counts[-1]=0
. Then the memcpy
is a no-op. But it also leads to potentially evaluating the first argument of memcpy
, which is &sample_indices[right_offset[thread_idx]] = &sample_indices[n_samples]
which accesses one element after the last one, i.e. out of bound.
Again, this is a no-op, but it should not be executed for potentially undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment in the hope to make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying the comment.
I think this is a legit bug, good find! IIUC, it should be fairly rare, as it can only ever happen if we're splitting the rightmost node in the tree. For that node, we would be accessing partition[n_samples]
which is OOB. But for any other nodes, since sample_indices
is a view on this bigger partition
array, any access to sample_indices[n_samples_at_node]
is indeed out of the node's boudary, but it's still within partition
, so there wouldn't be any segfault.
A follow-up thought, I have after discovering this bug: Do we want to setup a CICD pipeline that compiles with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment about comment, otherwise LGTM
# right_offset[-1] = len(sample_indices) leading to evaluating | ||
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | ||
# = &partition[n_samples_in_tree] which produces | ||
# IndexError: Out of bounds on buffer access (with boundscheck=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this is clearer (without going through partition
)?
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which produces | |
# IndexError: Out of bounds on buffer access (with boundscheck=True) | |
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[len(sample_indices)] which is out of bounds. |
In general, sample_indices
is a view so sample_indices[len(sample_indices)]
can exist, but in the scope of this method, one should not access sample_indices
beyond it's bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if your suggestion is clearer. For the current comment, see #21130 (comment).
@NicolasHug might have an opinion?
And YES, even though sample_indices
is a view on partition
and the operation might be allowed with high probability, we should never ever access x[len(x)]
- nowhere ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accessing sample_indices[len(sample_indices)]
is out of bound w.r.t. the logic, but not out of the allowed memory space. This is the latter that causes a segfault, not the former. So I'd still prefer to be specific here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should never ever access x[len(x)] - nowhere ever
That's fair @lorentzenchr, but by that logic, we should also be protecting the left child calls.
There's never a segfault on the left child calls for the same reason there's never a segfault if the parent isn't the right-most node. Yet these OOB accesses can happen on any node and actually on any thread. There's just one single case where this is problematic.
I'm fine with either as long as we're consistent :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the implementation correctly (I am new to this one), I guess right_counts[thread_idx]
or left_counts[thread_idx]
being null do not cause any problem in this case even if we touch an OOB element as nothing will be copied by memset
.
Knowing this edge-case is useful (I think we can keep the comment), but do we want to add this conditional statement if it does not prevent any error but solely when boundscheck=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it clarifies that this also prevents segmentation faults?
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which produces | |
# IndexError: Out of bounds on buffer access (with boundscheck=True) | |
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which is out of bounds and can cause a | |
# segmentation fault. | |
# | |
# When boundscheck=True, this produces: | |
# IndexError: Out of bounds on buffer access |
EDIT: see the new suggestion bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the intent of the suggestion but the indentation of the suggested change in the comment makes it really hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying the comment while taking care of the readers' eyes. 👀
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which produces | |
# IndexError: Out of bounds on buffer access (with boundscheck=True) | |
# right_offset[-1] = len(sample_indices) | |
# | |
# leading to evaluating | |
# | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] | |
# | |
# which is an out-of-bounds read access that can cause a segmentation fault. | |
# When boundscheck=True, removing this check produces this exception: | |
# | |
# IndexError: Out of bounds on buffer access | |
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the text above better? I think we should move forward with this to focus on #21020.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR with the @jjerphan suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I don't know how easy a CI with bounds checks enabled (as proposed in #21130 (comment)) is to set and maintain overtime. Shouldn't only the final packaged implementations be tested?
If we decide to set one, we can also enable other checks such as the one for memory-views initialization which are settable via initializedcheck
.
# right_offset[-1] = len(sample_indices) leading to evaluating | ||
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | ||
# = &partition[n_samples_in_tree] which produces | ||
# IndexError: Out of bounds on buffer access (with boundscheck=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it clarifies that this also prevents segmentation faults?
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which produces | |
# IndexError: Out of bounds on buffer access (with boundscheck=True) | |
# right_offset[-1] = len(sample_indices) leading to evaluating | |
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node] | |
# = &partition[n_samples_in_tree] which is out of bounds and can cause a | |
# segmentation fault. | |
# | |
# When boundscheck=True, this produces: | |
# IndexError: Out of bounds on buffer access |
EDIT: see the new suggestion bellow.
FYI, I won't be able to continue for the next month. In case some reviewer want's to take over, feel free. |
Interesting idea but probably for another PR as we might go down a deep rabbit hole following this suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well. Let's wait for the CI to be green after the last merge commit and let's merge.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Thanks for finishing this one! |
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
This fixes an out of bound error in HGBT estimators that might lead to segfault.
Discovered in #21020 (comment).
How to produce the error?
In the current
main
branch, setboundscheck=True
insplitting.pyx
and run a tests likeThis gives: