Skip to content

BUG: Raise ValueError on strings with smaller than expected byte-size #18427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

melissawm
Copy link
Member

This is an attempt at fixing #15311.

This raises a ValueError if the input array has byte-size smaller than expected, but the string is truncated if it is larger than expected (which is consistent with the current docs).

Feedback is appreciated!

Closes #15311

@melissawm melissawm requested a review from pearu February 16, 2021 20:59
@mattip
Copy link
Member

mattip commented Feb 22, 2021

@pearu thoughts?

@pearu
Copy link
Contributor

pearu commented Feb 22, 2021

In general, when the Fortran function expects a string with a fixed number of bytes and users input contains fewer bytes than expected, then for a proper program behavior we have the following options:

  1. raise a value error. Otherwise, the Fortran function may access out-of-bounds memory.
  2. copy user input to a temporary variable with the correct size and that fills unspecified bytes, say, with nulls. When the Fortran function returns, copy possible changes back to user input.

Here, the chosen behavior depends on the input intent:

  1. with intent(inout) or intent(cache) or intent(in overwrite), a ValueError should be raised
  2. with intent(in) or intent(in, copy), use a proper size temporary variable
  3. with intent(inplace), use temporary variable and copy (truncated) changes back to user's input
  4. with intent(in, out), use a temporary variable and return it as output.

This PR implements a solution to intent(inout) use case only. @melissawm could you implement unit-tests for other intent cases as well in order to learn what is the overall state of the issue?

Base automatically changed from master to main March 4, 2021 02:05
@melissawm
Copy link
Member Author

Sorry for taking so long - there are some comments that I can remove later, but I wanted to check for behavior first. Let me know what you think @pearu - thanks!

@charris
Copy link
Member

charris commented Apr 12, 2021

@melissawm Do you want to keep this open? The linter is complaining about some long lines in the test.

@melissawm
Copy link
Member Author

@charris I'll fix that and I think this is close to done. Thanks!

@pearu
Copy link
Contributor

pearu commented Apr 12, 2021

Wait for my related PR re string_from_obj. I think there are cases where the input can have fewer bytes than expected, or at least, we should discuss this.

@pearu
Copy link
Contributor

pearu commented May 25, 2021

This PR is blocked by #18759 that needs to be merged first as it defines consistent handling of string arguments.

@melissawm
Copy link
Member Author

Now that #18759 has been merged I'll pick up work on this.

@melissawm
Copy link
Member Author

@pearu - now that #18759 has been merged, consider the following example:

subroutine string_size_inout(a)
    implicit none
    character(len=4), intent(inout) :: a
    a(4:4) = 'A'
end subroutine string_size_inout

Wrapping it with f2py and calling it from Python with a = np.array(b'123') should result in a ValueError, which will change the behavior of some of the tests added in #18759 -currently, the result would still be array(b'123', dtype='|S3'). (Just checking so I get the general direction right)

@pearu
Copy link
Contributor

pearu commented Jun 1, 2021

@pearu - now that #18759 has been merged, consider the following example:

subroutine string_size_inout(a)
    implicit none
    character(len=4), intent(inout) :: a
    a(4:4) = 'A'
end subroutine string_size_inout

Wrapping it with f2py and calling it from Python with a = np.array(b'123') should result in a ValueError, which will change the behavior of some of the tests added in #18759 -currently, the result would still be array(b'123', dtype='|S3'). (Just checking so I get the general direction right)

@melissawm , first, let me explain the overall meaning of intent(inout) as defined in https://numpy.org/doc/stable/f2py/python-usage.html#using-f2py-bindings-in-python. There are three basic argument types: scalars, strings, and arrays.

  • For intent(inout) array arguments, f2py passes array data pointer directly to Fortran function with the reasoning that arrays can be large and using any intermediate memory buffer would be inefficient (btw, in the intent(inplace) place, an intermediate memory buffer would be used). So, when the data pointer of an array cannot be passed directly to Fortran, a ValueError is raised in the case of intent(inout) array arguments.
  • For intent(inout) scalar arguments, f2py always uses an intermediate memory buffer with the reasoning that intent(inout) effect is achieved more generally by using scalar casting. For instance, passing array(123, dtype=int8) to integer*8 intent(inout) argument is allowed and effective in intent(inout) sense that input scalar-array is changed in place. So, a scalar value with a dtype being different from the intent(inout) scalar argument can be still be passed to Fortran and after return, be updated via scalar casting - although some information may be lost when values are out of the range of input scalar dtype.
  • String arguments are similar to both scalar and array arguments, in general. In f2py, the choice is made that string arguments behave similar to scalar arguments rather than to array arguments with the reasoning that in Python both scalars and strings are immutable. As a result, string arguments always use intermediate memory buffers when passing data to Fortran functions. So, f2py is able to support intent(inout) effect also on string inputs even when there may be mismatches of lengths. The loss of information via string truncation to fixed-width would be analogous to losing information in the scalar casting case.

I hope the above makes sense. But in short, yes, the current master behavior you experienced is expected: only the first 3 bytes are copied back to the input argument a and there is no ValueError raised. I would suggest keeping this behavior for consistency with scalars. The output chunk reported in the original issue #15311 is fixed by #18759 . About the "a and b are expected to be of length 5" part, I would categorize it as another approach to fix the output chunk and that is not needed anymore.

In retrospect, catching the silent loss of information from scalar casting or string truncation would be a nice feature. However, note that the loss of information cannot be always detected before calling the Fortran function, but one could only detect the possibility of a loss of information. The actual loss of information may or may not realize. Perhaps throwing a warning would be a more appropriate action rather than an exception? In any case, if one would be purchasing for this feature, it should be implemented for both scalars and strings at the same time and that would involve checking the changes in intermediate buffers to see if any change is out of range of input dtype and then throw an exception.

@HaoZeke
Copy link
Member

HaoZeke commented Nov 20, 2023

The original issue this was meant to address has been closed as fixed, but the tests here are still good to have and nail down what we expect from string handling..

@melissawm
Copy link
Member Author

Let me close this and follow with with a pr with just the tests then. Thanks!

@melissawm melissawm closed this Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

BUG: f2py not triggering ValueError with smaller-than-expected byte-size input in array
5 participants