Skip to content

invaild assertion of _io__WindowConsoleIO_write_impl #124008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
byundojin opened this issue Sep 12, 2024 · 2 comments
Closed

invaild assertion of _io__WindowConsoleIO_write_impl #124008

byundojin opened this issue Sep 12, 2024 · 2 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes extension-modules C modules in the Modules dir OS-windows topic-IO type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@byundojin
Copy link
Contributor

byundojin commented Sep 12, 2024

Feature or enhancement

Proposal:

If function _io__WindowsConsoleIO_write_impl does not produce as expected output from WriteConsoleW, it is the process of recalculating the existing len.

At this point, function WideCharToMultiByte finds the byte length of utf-8 and function MultiByteToWideChar finds the letter length of utf-16. And is this comparison valid for comparing the results of these two?

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/is-the-size-comparison-of-utf-8-and-utf-16-valid/63614

Linked PRs

@byundojin byundojin added the type-feature A feature request or enhancement label Sep 12, 2024
@sobolevn sobolevn added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Sep 13, 2024
@picnixz picnixz added topic-IO extension-modules C modules in the Modules dir labels Sep 13, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 13, 2024
…he Windows console

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix handling of memory allocation failures.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 13, 2024
…he Windows console

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
@serhiy-storchaka
Copy link
Member

This issue is more complex than it looked.

This assertion is not simply redundant, it is wrong. It is not normally triggered because this code is not normally executed. Actually, I do not know how to trigger this situation. But if comment out && n < wlen, the assertion will fail when write non-ASCII data.

Using WideCharToMultiByte() to calculate the offset in bytes from the number of wchars that was written is not correct either. Invalid UTF-8 byte produces \ufffd which will be converted back to 3-bytes UTF-8 sequence \xef\xbf\xbd. So you can get wrong len, even larger than the initial length.

There are other bugs here, for example, no MemoryError is raised when PyMem_Malloc() or other memory allocation functions fail.

#124059 tries to fix this, but the problem is that it is difficult to test, because normally all characters are written. I used a trick -- in the debug build the binary search is triggered even when all characters are written, but this is not a good test. @eryksun, do you know how to make WriteConsoleW() writing less characters than provided?

@serhiy-storchaka serhiy-storchaka added OS-windows 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 13, 2024
@serhiy-storchaka
Copy link
Member

Note also that you should run the new test without using libregrtest, because it is guarded by non-standard resource "console":

python.bat -m test.test_winconsoleio -v

serhiy-storchaka added a commit that referenced this issue Nov 27, 2024
…dows console (GH-124059)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2024
…he Windows console (pythonGH-124059)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
(cherry picked from commit 3cf83d9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2024
…he Windows console (pythonGH-124059)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
(cherry picked from commit 3cf83d9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Nov 27, 2024
…the Windows console (GH-124059) (GH-127326)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
(cherry picked from commit 3cf83d9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Nov 30, 2024
…the Windows console (GH-124059) (GH-127325)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
(cherry picked from commit 3cf83d9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…he Windows console (pythonGH-124059)

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.

Also fix writing incomplete UTF-8 sequences.

Also fix handling of memory allocation failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes extension-modules C modules in the Modules dir OS-windows topic-IO type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants