-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Fix invalid read in f2py string_from_pyobj
#18646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
string_from_pyobj
string_from_pyobj
@@ -488,9 +488,9 @@ | |||
char *_from = (from); \\ | |||
FAILNULL(_to); FAILNULL(_from); \\ | |||
(void)strncpy(_to, _from, sizeof(char)*_m); \\ | |||
_to[_m-1] = '\\0'; \\ | |||
_to[_m] = '\\0'; \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks wrong to me, the call on line 627 results in a write to PyArray_DATA(arr)[PyArray_NBYTES(arr)]
which is beyond the end of the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps line 627 is wrong instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it is possible that the small array cache is hiding an invalid write, or the tests don't cover 627? (if that is the case, need to set these to 0, but I can try that):
numpy/numpy/core/src/multiarray/alloc.c
Lines 34 to 36 in 55ffea9
#define NBUCKETS 1024 /* number of buckets for data*/ | |
#define NBUCKETS_DIM 16 /* number of buckets for dimensions/strides */ | |
#define NCACHE 7 /* number of cache entries per bucket */ |
Curious, the below replaces NULL termination with space, is that for copying in both directions (mainly curious)? Other than that, line avoiding the NULL padding when copying into an array is the only concern. I wouldn't be surprised if just using a plain strncpy
is correct there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?
Not true, it is emitted from
Line 966 in 0fe69ae
\tf2py_success = try_pyarr_from_#ctype#(#varname#_capi,#varname#); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it sounds like line 627 needs fixing, perhaps by calling strncopy
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll cook up an example that touches this code and then look for a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a MWE:
code = """
subroutine test_inout(a)
implicit none
character(len=4), intent(inout) :: a
a(1:1) = 'A'
end subroutine test_inout
"""
from numpy import array
import numpy.f2py
numpy.f2py.compile(code, modulename='sizeinout', extension='.f90', verbose=0, extra_args="--debug-capi")
import sizeinout
a = array(b'1234')
sizeinout.test_inout(a)
print(f"a = {a}")
@@ -659,7 +660,7 @@ | |||
if (*len == -1) | |||
*len = (PyArray_ITEMSIZE(arr))*PyArray_SIZE(arr); | |||
STRINGMALLOC(*str,*len); | |||
STRINGCOPYN(*str,PyArray_DATA(arr),*len+1); | |||
STRINGCOPYN(*str,PyArray_DATA(arr),*len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need two different versions of STRINGCOPYN
; one for copying into a null-terminated buffer (line 688), and one for copying into a non-null-terminated buffer (this line).
Sorry, I misread
{ STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr)); } | ||
if (PyArray_Check(obj) && (!((arr = (PyArrayObject *)obj) == NULL))) { | ||
STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr)); | ||
} | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not directly related, but it looks like this return should be inside the if statement? (so that the error can be reached)
@seberg, @eric-wieser Good to go? |
After giving this some thought, I don't know if I understand what would be the best way forward. @pearu would you mind taking a look and sharing your thoughts on this? |
@melissawm I think this PR needs a step back as the fix might be wrong. First, let's be clear about the situation. We have a Fortran function that takes a fixed-length character array as an input:
and the aim is to pass the string buffer of a Python string-like object (bytes, str, numpy string, etc) as an argument to such a function. While doing this, the following constraints must be taken into account:
The origin of the bug that this PR tries to fix is most likely constraint 2) where one must be careful in interpreting null values correctly as it affects the size of needed memory for the temporary buffer. Notice that the original code
is changed to
that likely will fix the valgrind issue but the results will be incorrect. For instance,
results in To me, the change to (untested warning)
would make more sense (but this might not be the only changeset needed to fix the correctness issue). |
Thanks @pearu - this is very helpful. In particular this
I think this is the main issue I was hitting. Because What I`m not clear on is if allocating one extra space in STRINGMALLOC is enough to solve that problem. |
No, apparently it is not enough. In fact, allocating extra space might be even wrong.. Let me investigate this a little further. |
I think it would help a lot if these macros had a comment describing what their expected contract is - because it sounds like we've ended up in a position where different callers are assuming different incompatible contracts. |
I agree. I'll add the docs to the macros. Btw, I have a fix to the issue but since the macros are used also elsewhere, I'll need to verify that the fix does not break anything else. |
I´ll close this PR in favor of @pearu´s fix - thanks everyone! |
My fix is in #18759 |
This is meant to solve the issues reported by valgrind (mentioned on #18431), and is consistent with tests added on #18427 (which also test for
string_from_pyobj
functionality).Closes #18431