BUG: Fix invalid read in f2py `string_from_pyobj` #18646

melissawm · 2021-03-19T02:29:27Z

This is meant to solve the issues reported by valgrind (mentioned on #18431), and is consistent with tests added on #18427 (which also test for string_from_pyobj functionality).

Closes #18431

eric-wieser · 2021-03-19T10:29:11Z

numpy/f2py/cfuncs.py

@@ -488,9 +488,9 @@
        char *_from = (from);                                   \\
        FAILNULL(_to); FAILNULL(_from);                         \\
        (void)strncpy(_to, _from, sizeof(char)*_m);             \\
-        _to[_m-1] = '\\0';                                      \\
+        _to[_m] = '\\0';                                        \\


This change looks wrong to me, the call on line 627 results in a write to PyArray_DATA(arr)[PyArray_NBYTES(arr)] which is beyond the end of the array.

Perhaps line 627 is wrong instead.

Hmm, it is possible that the small array cache is hiding an invalid write, or the tests don't cover 627? (if that is the case, need to set these to 0, but I can try that):

numpy/numpy/core/src/multiarray/alloc.c

Lines 34 to 36 in 55ffea9

#define NBUCKETS 1024 /* number of buckets for data*/

#define NBUCKETS_DIM 16 /* number of buckets for dimensions/strides */

#define NCACHE 7 /* number of cache entries per bucket */

)

Curious, the below replaces NULL termination with space, is that for copying in both directions (mainly curious)? Other than that, line avoiding the NULL padding when copying into an array is the only concern. I wouldn't be surprised if just using a plain strncpy is correct there.

I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?

I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?

Not true, it is emitted from

numpy/numpy/f2py/rules.py

Line 966 in 0fe69ae

\tf2py_success = try_pyarr_from_#ctype#(#varname#_capi,#varname#);

Then it sounds like line 627 needs fixing, perhaps by calling strncopy directly.

Yes, I'll cook up an example that touches this code and then look for a fix.

Here's a MWE:

code = """ subroutine test_inout(a) implicit none character(len=4), intent(inout) :: a a(1:1) = 'A' end subroutine test_inout """ from numpy import array import numpy.f2py numpy.f2py.compile(code, modulename='sizeinout', extension='.f90', verbose=0, extra_args="--debug-capi") import sizeinout a = array(b'1234') sizeinout.test_inout(a) print(f"a = {a}")

eric-wieser · 2021-03-19T10:48:21Z

numpy/f2py/cfuncs.py

@@ -659,7 +660,7 @@
        if (*len == -1)
            *len = (PyArray_ITEMSIZE(arr))*PyArray_SIZE(arr);
        STRINGMALLOC(*str,*len);
-        STRINGCOPYN(*str,PyArray_DATA(arr),*len+1);
+        STRINGCOPYN(*str,PyArray_DATA(arr),*len);


~~I think you need two different versions of STRINGCOPYN; one for copying into a null-terminated buffer (line 688), and one for copying into a non-null-terminated buffer (this line).~~

Sorry, I misread

seberg · 2021-03-19T14:28:27Z

numpy/f2py/cfuncs.py

-        { STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr)); }
+    if (PyArray_Check(obj) && (!((arr = (PyArrayObject *)obj) == NULL))) {
+        STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr));
+    }
    return 1;


Sorry, not directly related, but it looks like this return should be inside the if statement? (so that the error can be reached)

charris · 2021-03-31T21:28:13Z

@seberg, @eric-wieser Good to go?

melissawm · 2021-04-09T15:24:33Z

After giving this some thought, I don't know if I understand what would be the best way forward. @pearu would you mind taking a look and sharing your thoughts on this?

pearu · 2021-04-12T10:01:50Z

@melissawm I think this PR needs a step back as the fix might be wrong.

First, let's be clear about the situation. We have a Fortran function that takes a fixed-length character array as an input:

character(len=4) :: a

and the aim is to pass the string buffer of a Python string-like object (bytes, str, numpy string, etc) as an argument to such a function. While doing this, the following constraints must be taken into account:

In general, Python strings are immutable (but numpy strings are mutable), so f2py always copies the string buffer of a Python string-like object to a temporary char* buffer that is used as the Fortran function argument. This is the use case of string_from_pyobj.
Some Python string-like objects are null-terminated and some are not. The Fortran fixed-length character array is null ignorant, that is, when \0 appears in the character array, this does not affect the length of the array (because it is fixed). However, the I/O result or when creating a Python string from it may be affected. So, it is also important to interpret the test failures correctly.
When the argument is specified as intent(inout), only mutable string-like objects can be used as inputs (currently, only numpy strings are supported). In this case, after calling the Fortran function, the content of the temporary buffer is copied to the input string buffer. This is the use case of try_pyarr_from_string.

The origin of the bug that this PR tries to fix is most likely constraint 2) where one must be careful in interpreting null values correctly as it affects the size of needed memory for the temporary buffer.

Notice that the original code

STRINGMALLOC(*str,*len);
STRINGCOPYN(*str,PyArray_DATA(arr),*len+1);

is changed to

STRINGMALLOC(*str,*len);
STRINGCOPYN(*str,PyArray_DATA(arr),*len);

that likely will fix the valgrind issue but the results will be incorrect. For instance,

a = np.array(b'1234')
test_inout(a)

results in array(b'A23', dtype='|S4') while the expected result is array(b'A234', dtype='|S4').

To me, the change to (untested warning)

STRINGMALLOC(*str,*len + 1);
STRINGCOPYN(*str,PyArray_DATA(arr),*len + 1);

would make more sense (but this might not be the only changeset needed to fix the correctness issue).

melissawm · 2021-04-12T12:26:20Z

Thanks @pearu - this is very helpful. In particular this

Some Python string-like objects are null-terminated and some are not.

I think this is the main issue I was hitting. Because STRINGMALLOC(*str,*len); actually allocates a len+1-sized string, I thought this would be enough to fit the null terminator, but it seems this is not enough because even though this works internally, when try_pyarr_from_string is called it results in (possibly) writing one extra character to the Python object.

What I`m not clear on is if allocating one extra space in STRINGMALLOC is enough to solve that problem.

pearu · 2021-04-12T13:43:10Z

No, apparently it is not enough. In fact, allocating extra space might be even wrong.. Let me investigate this a little further.

eric-wieser · 2021-04-12T13:46:01Z

I think it would help a lot if these macros had a comment describing what their expected contract is - because it sounds like we've ended up in a position where different callers are assuming different incompatible contracts.

pearu · 2021-04-12T14:00:45Z

I agree. I'll add the docs to the macros.

Btw, I have a fix to the issue but since the macros are used also elsewhere, I'll need to verify that the fix does not break anything else.

melissawm · 2021-04-12T14:55:39Z

I´ll close this PR in favor of @pearu´s fix - thanks everyone!

pearu · 2021-04-12T20:27:11Z

My fix is in #18759

BUG: Fixed invalid read in f2py string_from_pyobj

13ad111

melissawm requested a review from seberg March 19, 2021 02:29

github-actions bot added the 00 - Bug label Mar 19, 2021

melissawm added the component: numpy.f2py label Mar 19, 2021

melissawm changed the title ~~BUG: Fixed invalid read in f2py string_from_pyobj~~ BUG: Fix invalid read in f2py string_from_pyobj Mar 19, 2021

eric-wieser reviewed Mar 19, 2021

View reviewed changes

seberg reviewed Mar 19, 2021

View reviewed changes

melissawm closed this Apr 12, 2021

melissawm deleted the invalid-read branch February 8, 2022 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix invalid read in f2py `string_from_pyobj` #18646

BUG: Fix invalid read in f2py `string_from_pyobj` #18646

melissawm commented Mar 19, 2021

eric-wieser Mar 19, 2021

eric-wieser Mar 19, 2021

seberg Mar 19, 2021

eric-wieser Mar 19, 2021

pearu Apr 9, 2021

eric-wieser Apr 9, 2021

pearu Apr 9, 2021

melissawm Apr 9, 2021

eric-wieser Mar 19, 2021 •

edited

Loading

seberg Mar 19, 2021

charris commented Mar 31, 2021

melissawm commented Apr 9, 2021

pearu commented Apr 12, 2021 •

edited

Loading

melissawm commented Apr 12, 2021

pearu commented Apr 12, 2021

eric-wieser commented Apr 12, 2021

pearu commented Apr 12, 2021

melissawm commented Apr 12, 2021

pearu commented Apr 12, 2021

	#define NBUCKETS 1024 /* number of buckets for data*/
	#define NBUCKETS_DIM 16 /* number of buckets for dimensions/strides */
	#define NCACHE 7 /* number of cache entries per bucket */

BUG: Fix invalid read in f2py string_from_pyobj #18646

BUG: Fix invalid read in f2py string_from_pyobj #18646

Conversation

melissawm commented Mar 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Mar 31, 2021

melissawm commented Apr 9, 2021

pearu commented Apr 12, 2021 • edited Loading

melissawm commented Apr 12, 2021

pearu commented Apr 12, 2021

eric-wieser commented Apr 12, 2021

pearu commented Apr 12, 2021

melissawm commented Apr 12, 2021

pearu commented Apr 12, 2021

BUG: Fix invalid read in f2py `string_from_pyobj` #18646

BUG: Fix invalid read in f2py `string_from_pyobj` #18646

eric-wieser Mar 19, 2021 •

edited

Loading

pearu commented Apr 12, 2021 •

edited

Loading