Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyArg_ParseTupleAndKeywords() and non-ASCII keyword names #110815

Closed
serhiy-storchaka opened this issue Oct 13, 2023 · 4 comments
Closed

PyArg_ParseTupleAndKeywords() and non-ASCII keyword names #110815

serhiy-storchaka opened this issue Oct 13, 2023 · 4 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes topic-C-API topic-unicode type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Oct 13, 2023

Most of C strings in the C API are implied to be UTF-8 encoded. PyArg_ParseTupleAndKeywords() mostly works with non-ASCII keyword names as they are UTF-8 encoded. Except one case, when you pass argument by keyword with invalid non-ASCII name to a function that has optional parameter with non-ASCII UTF-8 encoded name. In this case you get a crash in the debug build.

It was caused by combination of f4934ea and a83a6a3 (bpo-28701, #72887). Before these changes you simply got inaccurate or even wrong error message.

Examples:

  1. Parameters: "ä"
    Keyword arguments: "ë"
    Old behavior: TypeError "'ë' is an invalid keyword argument for this function"
    Current behavior: crash
    Expected behavior: TypeError "'ë' is an invalid keyword argument for this function"

  2. Parameters: "ä"
    Keyword arguments: "ä"
    Old behavior: TypeError "invalid keyword argument for this function"
    Current behavior: crash
    Expected behavior: TypeError "'ä' is an invalid keyword argument for this function"

  3. Parameters: "ä", "ë"
    Keyword arguments: "ä", "ë"
    Old behavior: TypeError "'ë' is an invalid keyword argument for this function"
    Current behavior: crash
    Expected behavior: TypeError "'ä' is an invalid keyword argument for this function"

In case 1 the pre-bpo-28701 behavior was correct, in case 2 it was correct but not precise (it failed to find the name of invalid keyword argument), in case 3 it was wrong (it found wrong name). In all cases there is a crash currently.

Linked PRs

@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error 3.11 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump topic-C-API 3.12 bugs and security fixes 3.13 bugs and security fixes labels Oct 13, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 13, 2023
…ndKeywords()

It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
@serhiy-storchaka
Copy link
Member Author

I am not sure how to classify this issue: as a bug or a new feature. On one hand, it was a regression in 3.7, crash is worse than wrong error message. On other hand, it was not explicitly documented that non-ASCII names are supported, there were no tests, and if it worked, it was only by accident. And it was a long time ago. So, official support of non-ASCII names can be a new feature.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 13, 2023
…ndKeywords()

It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 13, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 13, 2023
…honGH-110817)

(cherry picked from commit 548ce09)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Oct 13, 2023
…-110817) (GH-110825)

(cherry picked from commit 548ce09)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@terryjreedy
Copy link
Member

Is the issue visible from Python code:

def f(ä=''): pass

f(ä="ë")
f(ä="ä")

works. If something like this were to crash, I would say definitely a bugfix to be backported.

@serhiy-storchaka
Copy link
Member Author

It is an issue with PyArg_ParseTupleAndKeywords(), a C API used for parsing keyword arguments in extension functions. Python defined functions do not use it. Also, it is an issue with argument names, not values. Non-ASCII argument names are forbidden in the core interpreter and the stdlib, but third parties may have reasons to use them.

serhiy-storchaka added a commit that referenced this issue Oct 14, 2023
…ords() (GH-110816)

It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
@terryjreedy
Copy link
Member

Thanks. Makes sense now.

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ndKeywords() (pythonGH-110816)

It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…ndKeywords() (pythonGH-110816)

It already mostly worked, except in the case when invalid keyword
argument with non-ASCII name was passed to function with non-ASCII
parameter names. Then it crashed in the debug mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes topic-C-API topic-unicode type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants