Skip to content

BUG: Fix invalid read in f2py string_from_pyobj #18646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

melissawm
Copy link
Member

This is meant to solve the issues reported by valgrind (mentioned on #18431), and is consistent with tests added on #18427 (which also test for string_from_pyobj functionality).

Closes #18431

@melissawm melissawm requested a review from seberg March 19, 2021 02:29
@melissawm melissawm changed the title BUG: Fixed invalid read in f2py string_from_pyobj BUG: Fix invalid read in f2py string_from_pyobj Mar 19, 2021
@@ -488,9 +488,9 @@
char *_from = (from); \\
FAILNULL(_to); FAILNULL(_from); \\
(void)strncpy(_to, _from, sizeof(char)*_m); \\
_to[_m-1] = '\\0'; \\
_to[_m] = '\\0'; \\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks wrong to me, the call on line 627 results in a write to PyArray_DATA(arr)[PyArray_NBYTES(arr)] which is beyond the end of the array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps line 627 is wrong instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it is possible that the small array cache is hiding an invalid write, or the tests don't cover 627? (if that is the case, need to set these to 0, but I can try that):

#define NBUCKETS 1024 /* number of buckets for data*/
#define NBUCKETS_DIM 16 /* number of buckets for dimensions/strides */
#define NCACHE 7 /* number of cache entries per bucket */
)

Curious, the below replaces NULL termination with space, is that for copying in both directions (mainly curious)? Other than that, line avoiding the NULL padding when copying into an array is the only concern. I wouldn't be surprised if just using a plain strncpy is correct there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 627 is never even emitted by f2py; the entire function that line is in seems suspect. Can we just remove it entirely?

Not true, it is emitted from

\tf2py_success = try_pyarr_from_#ctype#(#varname#_capi,#varname#);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it sounds like line 627 needs fixing, perhaps by calling strncopy directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll cook up an example that touches this code and then look for a fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a MWE:

code = """
subroutine test_inout(a)
  implicit none
  character(len=4), intent(inout) :: a
  a(1:1) = 'A'
end subroutine test_inout
"""

from numpy import array
import numpy.f2py

numpy.f2py.compile(code, modulename='sizeinout', extension='.f90', verbose=0, extra_args="--debug-capi")

import sizeinout

a = array(b'1234')
sizeinout.test_inout(a)
print(f"a = {a}")

@@ -659,7 +660,7 @@
if (*len == -1)
*len = (PyArray_ITEMSIZE(arr))*PyArray_SIZE(arr);
STRINGMALLOC(*str,*len);
STRINGCOPYN(*str,PyArray_DATA(arr),*len+1);
STRINGCOPYN(*str,PyArray_DATA(arr),*len);
Copy link
Member

@eric-wieser eric-wieser Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need two different versions of STRINGCOPYN; one for copying into a null-terminated buffer (line 688), and one for copying into a non-null-terminated buffer (this line).

Sorry, I misread

{ STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr)); }
if (PyArray_Check(obj) && (!((arr = (PyArrayObject *)obj) == NULL))) {
STRINGCOPYN(PyArray_DATA(arr),str,PyArray_NBYTES(arr));
}
return 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not directly related, but it looks like this return should be inside the if statement? (so that the error can be reached)

@charris
Copy link
Member

charris commented Mar 31, 2021

@seberg, @eric-wieser Good to go?

@melissawm
Copy link
Member Author

After giving this some thought, I don't know if I understand what would be the best way forward. @pearu would you mind taking a look and sharing your thoughts on this?

@pearu
Copy link
Contributor

pearu commented Apr 12, 2021

@melissawm I think this PR needs a step back as the fix might be wrong.

First, let's be clear about the situation. We have a Fortran function that takes a fixed-length character array as an input:

character(len=4) :: a

and the aim is to pass the string buffer of a Python string-like object (bytes, str, numpy string, etc) as an argument to such a function. While doing this, the following constraints must be taken into account:

  1. In general, Python strings are immutable (but numpy strings are mutable), so f2py always copies the string buffer of a Python string-like object to a temporary char* buffer that is used as the Fortran function argument. This is the use case of string_from_pyobj.
  2. Some Python string-like objects are null-terminated and some are not. The Fortran fixed-length character array is null ignorant, that is, when \0 appears in the character array, this does not affect the length of the array (because it is fixed). However, the I/O result or when creating a Python string from it may be affected. So, it is also important to interpret the test failures correctly.
  3. When the argument is specified as intent(inout), only mutable string-like objects can be used as inputs (currently, only numpy strings are supported). In this case, after calling the Fortran function, the content of the temporary buffer is copied to the input string buffer. This is the use case of try_pyarr_from_string.

The origin of the bug that this PR tries to fix is most likely constraint 2) where one must be careful in interpreting null values correctly as it affects the size of needed memory for the temporary buffer.

Notice that the original code

STRINGMALLOC(*str,*len);
STRINGCOPYN(*str,PyArray_DATA(arr),*len+1);

is changed to

STRINGMALLOC(*str,*len);
STRINGCOPYN(*str,PyArray_DATA(arr),*len);

that likely will fix the valgrind issue but the results will be incorrect. For instance,

a = np.array(b'1234')
test_inout(a)

results in array(b'A23', dtype='|S4') while the expected result is array(b'A234', dtype='|S4').

To me, the change to (untested warning)

STRINGMALLOC(*str,*len + 1);
STRINGCOPYN(*str,PyArray_DATA(arr),*len + 1);

would make more sense (but this might not be the only changeset needed to fix the correctness issue).

@melissawm
Copy link
Member Author

Thanks @pearu - this is very helpful. In particular this

Some Python string-like objects are null-terminated and some are not.

I think this is the main issue I was hitting. Because STRINGMALLOC(*str,*len); actually allocates a len+1-sized string, I thought this would be enough to fit the null terminator, but it seems this is not enough because even though this works internally, when try_pyarr_from_string is called it results in (possibly) writing one extra character to the Python object.

What I`m not clear on is if allocating one extra space in STRINGMALLOC is enough to solve that problem.

@pearu
Copy link
Contributor

pearu commented Apr 12, 2021

No, apparently it is not enough. In fact, allocating extra space might be even wrong.. Let me investigate this a little further.

@eric-wieser
Copy link
Member

I think it would help a lot if these macros had a comment describing what their expected contract is - because it sounds like we've ended up in a position where different callers are assuming different incompatible contracts.

@pearu
Copy link
Contributor

pearu commented Apr 12, 2021

I agree. I'll add the docs to the macros.

Btw, I have a fix to the issue but since the macros are used also elsewhere, I'll need to verify that the fix does not break anything else.

@melissawm
Copy link
Member Author

I´ll close this PR in favor of @pearu´s fix - thanks everyone!

@melissawm melissawm closed this Apr 12, 2021
@pearu
Copy link
Contributor

pearu commented Apr 12, 2021

My fix is in #18759

@melissawm melissawm deleted the invalid-read branch February 8, 2022 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: F2Py Invalid read in string_from_pyobj
5 participants