gh-124008: Fix calculation of the number of written bytes for the Windows console #124059

serhiy-storchaka · 2024-09-13T15:35:09Z

Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters.

Also fix handling of memory allocation failures.

Issue: invaild assertion of _io__WindowConsoleIO_write_impl #124008

…he Windows console Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures.

serhiy-storchaka · 2024-11-07T07:06:18Z

@zooba, @vstinner, do you mind to take a look at this PR?

zooba · 2024-11-13T22:30:40Z

I keep trying to review and I keep getting confused 😆

Perhaps we should revise this entire piece of functionality, do one big conversion (and hope that anyone writing more than 32KB to the console in one go has enough RAM), and loop over that? Then we're only dealing with UTF-16 pairs, which ought to be simpler.

cmaloney · 2024-11-13T22:47:28Z

The RAM constraint should be gone with Windows 8+ (gh-121940), just the "interactivity" / Ctrl-C interruptability (if can get to WaitForMultipleObjectsEx would solve that and is where I've been trying to aim recently, but also lots of steps to get there).

zooba · 2024-11-13T23:33:54Z

The RAM constraint should be gone with Windows 8+

I was referring to the memory we have to allocate to store a UTF-16 version of the string. If someone decides to write 2GB worth of ASCII in one go, we'll need another 4GB to store the UTF-16 encoded version, even before we get to passing it to the console APIs.

serhiy-storchaka · 2024-11-15T13:52:58Z

This PR consists of three parts:

fixed _find_last_utf8_boundary();
_wchar_to_utf8_count() to calculate the number of partially written bytes;
numerous other fixes, mainly missed PyErr_NoMemory().

Even if the 32KiB limit is gone, there is still the 2GiB or 4GiB limit, so the code will stay almost the same. It cannot help. Unless we get rid of MultiByteToWideChar() and use double conversion with Python UTF-8 and UTF-16 codecs, but this would be a much larger change, and the code may be less efficient.

zooba · 2024-11-21T14:51:37Z

I think it's the binary search to find the converted length that upsets me the most.

Does it get any simpler if we arbitrarily cut at 32K bytes, count back to a complete UTF-8 character, and then write from there? Rather than trying to fit within 32KB worth of UTF-16, which is no longer a real limit on supported platforms.

serhiy-storchaka · 2024-11-22T15:14:03Z

No, it will not get much simpler.

Note that the code to find the initial length was already here. I only fixed some corner cases in it.

We still need binary search to map the number of actually written wchars to the number of UTF-8 bytes. Even if we do as you suggested, this will only save ~10 lines of code (relatively simple in comparison of the rest of this PR). We can do this in the following PR. For now, keeping this code means less changes.

zooba · 2024-11-25T14:55:54Z

Yeah, that's annoying... but you're right.

Okay, I have no further objections. Thanks for your work on this.

serhiy-storchaka · 2024-11-27T11:37:50Z

Thank you for your review @zooba.

miss-islington-app · 2024-11-27T11:38:18Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

…he Windows console (pythonGH-124059) Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures. (cherry picked from commit 3cf83d9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-app · 2024-11-27T11:38:45Z

GH-127325 is a backport of this pull request to the 3.13 branch.

bedevere-app · 2024-11-27T11:38:49Z

GH-127326 is a backport of this pull request to the 3.12 branch.

…the Windows console (GH-124059) (GH-127326) Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures. (cherry picked from commit 3cf83d9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-bot · 2024-11-27T15:20:26Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 RHEL8 Refleaks 3.12 has failed when building commit c3bb32d.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1125/builds/735) and take a look at the build logs.
Check if the failure is related to this commit (c3bb32d) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1125/builds/735

Failed tests:

test_complex

Failed subtests:

test_truediv - test.test_complex.ComplexTest.test_truediv

Summary of the results of the build (if available):

==

Click to see traceback logs

Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.12.cstratak-RHEL8-x86_64.refleak/build/Lib/test/test_complex.py", line 107, in test_truediv
    self.check_div(complex(random(), random()),
  File "/home/buildbot/buildarea/3.12.cstratak-RHEL8-x86_64.refleak/build/Lib/test/test_complex.py", line 84, in check_div
    self.assertClose(q, y)
  File "/home/buildbot/buildarea/3.12.cstratak-RHEL8-x86_64.refleak/build/Lib/test/test_complex.py", line 77, in assertClose
    self.assertCloseAbs(x.imag, y.imag, eps)
  File "/home/buildbot/buildarea/3.12.cstratak-RHEL8-x86_64.refleak/build/Lib/test/test_complex.py", line 72, in assertCloseAbs
    self.assertTrue(abs((x-y)/y) < eps)
AssertionError: False is not true

vstinner · 2024-11-27T15:33:09Z

test_truediv - test.test_complex.ComplexTest.test_truediv

It's unrelated and I cannot reproduce the issue on the buildbot (nor on my laptop). I don't know what's going on.

…the Windows console (GH-124059) (GH-127325) Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures. (cherry picked from commit 3cf83d9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

…he Windows console (pythonGH-124059) Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures.

serhiy-storchaka added OS-windows topic-IO needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Sep 13, 2024

bedevere-app bot added the awaiting core review label Sep 13, 2024

bedevere-app bot mentioned this pull request Sep 13, 2024

invaild assertion of _io__WindowConsoleIO_write_impl #124008

Closed

Merge branch 'main' into windows-console-write

a7a1ccb

serhiy-storchaka requested a review from a team November 5, 2024 08:46

serhiy-storchaka merged commit 3cf83d9 into python:main Nov 27, 2024
39 checks passed

bedevere-app bot removed the awaiting core review label Nov 27, 2024

serhiy-storchaka deleted the windows-console-write branch November 27, 2024 11:38

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 27, 2024

bedevere-app bot removed the needs backport to 3.12 only security fixes label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-124008: Fix calculation of the number of written bytes for the Windows console #124059

gh-124008: Fix calculation of the number of written bytes for the Windows console #124059

serhiy-storchaka commented Sep 13, 2024 •

edited by bedevere-app bot

Loading

serhiy-storchaka commented Nov 7, 2024

zooba commented Nov 13, 2024

cmaloney commented Nov 13, 2024

zooba commented Nov 13, 2024

serhiy-storchaka commented Nov 15, 2024

zooba commented Nov 21, 2024

serhiy-storchaka commented Nov 22, 2024

zooba commented Nov 25, 2024 •

edited

Loading

serhiy-storchaka commented Nov 27, 2024

miss-islington-app bot commented Nov 27, 2024

bedevere-app bot commented Nov 27, 2024

bedevere-app bot commented Nov 27, 2024

bedevere-bot commented Nov 27, 2024

vstinner commented Nov 27, 2024

gh-124008: Fix calculation of the number of written bytes for the Windows console #124059

gh-124008: Fix calculation of the number of written bytes for the Windows console #124059

Conversation

serhiy-storchaka commented Sep 13, 2024 • edited by bedevere-app bot Loading

serhiy-storchaka commented Nov 7, 2024

zooba commented Nov 13, 2024

cmaloney commented Nov 13, 2024

zooba commented Nov 13, 2024

serhiy-storchaka commented Nov 15, 2024

zooba commented Nov 21, 2024

serhiy-storchaka commented Nov 22, 2024

zooba commented Nov 25, 2024 • edited Loading

serhiy-storchaka commented Nov 27, 2024

miss-islington-app bot commented Nov 27, 2024

bedevere-app bot commented Nov 27, 2024

bedevere-app bot commented Nov 27, 2024

bedevere-bot commented Nov 27, 2024

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

vstinner commented Nov 27, 2024

serhiy-storchaka commented Sep 13, 2024 •

edited by bedevere-app bot

Loading

zooba commented Nov 25, 2024 •

edited

Loading