BUG: Raise ValueError on strings with smaller than expected byte-size #18427

melissawm · 2021-02-16T20:59:46Z

This is an attempt at fixing #15311.

This raises a ValueError if the input array has byte-size smaller than expected, but the string is truncated if it is larger than expected (which is consistent with the current docs).

Feedback is appreciated!

Closes #15311

mattip · 2021-02-22T08:22:12Z

@pearu thoughts?

pearu · 2021-02-22T09:04:17Z

In general, when the Fortran function expects a string with a fixed number of bytes and users input contains fewer bytes than expected, then for a proper program behavior we have the following options:

raise a value error. Otherwise, the Fortran function may access out-of-bounds memory.
copy user input to a temporary variable with the correct size and that fills unspecified bytes, say, with nulls. When the Fortran function returns, copy possible changes back to user input.

Here, the chosen behavior depends on the input intent:

with intent(inout) or intent(cache) or intent(in overwrite), a ValueError should be raised
with intent(in) or intent(in, copy), use a proper size temporary variable
with intent(inplace), use temporary variable and copy (truncated) changes back to user's input
with intent(in, out), use a temporary variable and return it as output.

This PR implements a solution to intent(inout) use case only. @melissawm could you implement unit-tests for other intent cases as well in order to learn what is the overall state of the issue?

melissawm · 2021-03-18T20:06:44Z

Sorry for taking so long - there are some comments that I can remove later, but I wanted to check for behavior first. Let me know what you think @pearu - thanks!

charris · 2021-04-12T17:57:09Z

@melissawm Do you want to keep this open? The linter is complaining about some long lines in the test.

melissawm · 2021-04-12T18:03:50Z

@charris I'll fix that and I think this is close to done. Thanks!

pearu · 2021-04-12T18:48:42Z

Wait for my related PR re string_from_obj. I think there are cases where the input can have fewer bytes than expected, or at least, we should discuss this.

pearu · 2021-05-25T16:33:58Z

This PR is blocked by #18759 that needs to be merged first as it defines consistent handling of string arguments.

melissawm · 2021-05-28T14:28:18Z

Now that #18759 has been merged I'll pick up work on this.

melissawm · 2021-05-31T23:34:08Z

@pearu - now that #18759 has been merged, consider the following example:

subroutine string_size_inout(a)
    implicit none
    character(len=4), intent(inout) :: a
    a(4:4) = 'A'
end subroutine string_size_inout

Wrapping it with f2py and calling it from Python with a = np.array(b'123') should result in a ValueError, which will change the behavior of some of the tests added in #18759 -currently, the result would still be array(b'123', dtype='|S3'). (Just checking so I get the general direction right)

pearu · 2021-06-01T10:00:29Z

@pearu - now that #18759 has been merged, consider the following example:
subroutine string_size_inout(a)
    implicit none
    character(len=4), intent(inout) :: a
    a(4:4) = 'A'
end subroutine string_size_inout
Wrapping it with f2py and calling it from Python with a = np.array(b'123') should result in a ValueError, which will change the behavior of some of the tests added in #18759 -currently, the result would still be array(b'123', dtype='|S3'). (Just checking so I get the general direction right)

@melissawm , first, let me explain the overall meaning of intent(inout) as defined in https://numpy.org/doc/stable/f2py/python-usage.html#using-f2py-bindings-in-python. There are three basic argument types: scalars, strings, and arrays.

For intent(inout) array arguments, f2py passes array data pointer directly to Fortran function with the reasoning that arrays can be large and using any intermediate memory buffer would be inefficient (btw, in the intent(inplace) place, an intermediate memory buffer would be used). So, when the data pointer of an array cannot be passed directly to Fortran, a ValueError is raised in the case of intent(inout) array arguments.
For intent(inout) scalar arguments, f2py always uses an intermediate memory buffer with the reasoning that intent(inout) effect is achieved more generally by using scalar casting. For instance, passing array(123, dtype=int8) to integer*8 intent(inout) argument is allowed and effective in intent(inout) sense that input scalar-array is changed in place. So, a scalar value with a dtype being different from the intent(inout) scalar argument can be still be passed to Fortran and after return, be updated via scalar casting - although some information may be lost when values are out of the range of input scalar dtype.
String arguments are similar to both scalar and array arguments, in general. In f2py, the choice is made that string arguments behave similar to scalar arguments rather than to array arguments with the reasoning that in Python both scalars and strings are immutable. As a result, string arguments always use intermediate memory buffers when passing data to Fortran functions. So, f2py is able to support intent(inout) effect also on string inputs even when there may be mismatches of lengths. The loss of information via string truncation to fixed-width would be analogous to losing information in the scalar casting case.

I hope the above makes sense. But in short, yes, the current master behavior you experienced is expected: only the first 3 bytes are copied back to the input argument a and there is no ValueError raised. I would suggest keeping this behavior for consistency with scalars. The output chunk reported in the original issue #15311 is fixed by #18759 . About the "a and b are expected to be of length 5" part, I would categorize it as another approach to fix the output chunk and that is not needed anymore.

In retrospect, catching the silent loss of information from scalar casting or string truncation would be a nice feature. However, note that the loss of information cannot be always detected before calling the Fortran function, but one could only detect the possibility of a loss of information. The actual loss of information may or may not realize. Perhaps throwing a warning would be a more appropriate action rather than an exception? In any case, if one would be purchasing for this feature, it should be implemented for both scalars and strings at the same time and that would involve checking the changes in intermediate buffers to see if any change is out of range of input dtype and then throw an exception.

HaoZeke · 2023-11-20T00:23:22Z

The original issue this was meant to address has been closed as fixed, but the tests here are still good to have and nail down what we expect from string handling..

melissawm · 2023-11-20T11:50:31Z

Let me close this and follow with with a pr with just the tests then. Thanks!

BUG: Raise ValueError on strings with smaller than expected byte-size

f2ba5f9

melissawm requested a review from pearu February 16, 2021 20:59

github-actions bot added the 00 - Bug label Feb 16, 2021

pearu assigned melissawm Feb 22, 2021

pearu added the component: numpy.f2py label Feb 22, 2021

Base automatically changed from master to main March 4, 2021 02:05

Add tests for other intents

2209ee3

Add linter suggestions.

b5f3a35

melissawm mentioned this pull request Mar 19, 2021

BUG: Fix invalid read in f2py string_from_pyobj #18646

Closed

melissawm closed this Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Raise ValueError on strings with smaller than expected byte-size #18427

BUG: Raise ValueError on strings with smaller than expected byte-size #18427

Uh oh!

melissawm commented Feb 16, 2021

Uh oh!

mattip commented Feb 22, 2021

Uh oh!

pearu commented Feb 22, 2021 •

edited

Loading

Uh oh!

melissawm commented Mar 18, 2021

Uh oh!

charris commented Apr 12, 2021

Uh oh!

melissawm commented Apr 12, 2021

Uh oh!

pearu commented Apr 12, 2021

Uh oh!

pearu commented May 25, 2021

Uh oh!

melissawm commented May 28, 2021

Uh oh!

melissawm commented May 31, 2021

Uh oh!

pearu commented Jun 1, 2021 •

edited

Loading

Uh oh!

HaoZeke commented Nov 20, 2023

Uh oh!

melissawm commented Nov 20, 2023

Uh oh!

Uh oh!

Uh oh!

BUG: Raise ValueError on strings with smaller than expected byte-size #18427

BUG: Raise ValueError on strings with smaller than expected byte-size #18427

Uh oh!

Conversation

melissawm commented Feb 16, 2021

Uh oh!

mattip commented Feb 22, 2021

Uh oh!

pearu commented Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melissawm commented Mar 18, 2021

Uh oh!

charris commented Apr 12, 2021

Uh oh!

melissawm commented Apr 12, 2021

Uh oh!

pearu commented Apr 12, 2021

Uh oh!

pearu commented May 25, 2021

Uh oh!

melissawm commented May 28, 2021

Uh oh!

melissawm commented May 31, 2021

Uh oh!

pearu commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoZeke commented Nov 20, 2023

Uh oh!

melissawm commented Nov 20, 2023

Uh oh!

Uh oh!

pearu commented Feb 22, 2021 •

edited

Loading

pearu commented Jun 1, 2021 •

edited

Loading