Skip to content

gh-121940: Reduce checking isatty on Windows write() #121941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Jul 17, 2024

This check was added for bpo-11395 in 2011. Versions of Windows before Windows 8 required the check as write size to a tty was limited (#55604 (comment)).

Based on PEP 11, the last Windows 8 support ended in 2023, so this is no longer needed on any currently supported windows platforms.

That large writes to console succeed is tested by test_os.test_write_windows_console, which passes in my testing on Windows 11.

It's possible there are still objects on windows which error on a write too large, this leaves in place some code to handle (ex. Pipes and ENOSPC). Write can also still be interrupted and resumed (EAGAIN, etc).

If there are other Windows objects or platform files which limit write size, callers should wrap the os.write function.

EDIT: Updated title to reflect changes in the PR (Now caps size / changes limits but doesn't remove the isatty)

cmaloney added 3 commits July 16, 2024 23:07
This check was added for bpo-11395 in 2011. Versions of windows before
Windows 8 required the check as write size to a tty was limited (
python#55604 (comment)).

Based on PEP 11, the last Windows 8 support ended in 2023, so this is no
longer needed on any currently supported windows platforms.

This is tested by test_os.test_write_windows_console
If there are other windows objects which require a max write size, wrap the write function with code to cap the length.
This check was added for bpo-11395 in 2011. Versions of Windows before
Windows 8 required the check as write size to a tty was limited
(python#55604 (comment)).

Based on PEP 11, the last Windows 8 support ended in 2023, so this is no
longer needed on any currently supported windows platforms.

That large writes to console succeed is tested by
test_os.test_write_windows_console, which passes in my testing on
Windows 11.

It's possible there are still objects on windows which error on a write
too large, this leaves in place some code to handle (ex. Pipes and
ENOSPC). Write can also still be interrupted and resumed (EAGAIN, etc).

If there are other Windows objects or platform files which limit write
size, callers should wrap the `os.write` function.
@zooba
Copy link
Member

zooba commented Jul 18, 2024

This will need more thought and discussion than just "it works now", but I suspect this simplification will be okay.

One thing to consider is that writing a huge amount of output will become uninterruptible, whereas today there are opportunities to interrupt (Ctrl+C). It may also result in greater memory usage and apparent slowness for writing large buffers compared to before. I'm not sure how important this is going to be in practice (personally, I'd raise a PleaseDontError for anyone writing more than a few KB to the console at a time 🙃).

@cmaloney
Copy link
Contributor Author

Re: Thought and discussion just this PR + issue okay or need to start a thread on discuss.python.org?

Re: Uninterruptible, is the worry more for console applications, anything that does large writes, or both?

If it's primarily console applications, adding to the default stdout/stderr wrapper in Python on Windows (_WindowsConsoleIO) some form of regular interruptible state definitely feels reasonable to me. I don't have familiarity with it on Windows, but can work on hunting. Reading https://mail.python.org/pipermail/python-dev/2017-August/148800.html it takes some precision. My main aim at the moment is simplifying general I/O much more than console.

@zooba
Copy link
Member

zooba commented Jul 18, 2024

Here or on the issue is fine, though I'm about to be away for a while so won't be able to participate until I'm back. Create a discuss issue if you want to move this forward and nobody notices the PR.

@eryksun
Copy link
Contributor

eryksun commented Jul 20, 2024

One thing to consider is that writing a huge amount of output will become uninterruptible, whereas today there are opportunities to interrupt (Ctrl+C).

Yes, one can write large buffers that may take several seconds, or even longer, to copy into the console screen buffer via WriteConsoleW(). I tried modifying the C signal handler to cancel synchronous I/O on the main thread via CancelSynchronousIo(hMainThread). Unfortunately the console host synchronizes I/O operations such that it doesn't read the Ctrl+C from the input buffer and generate the CTRL_C_EVENT until after the write has completed. So it appears that writing smaller chunks of text is the only way we'll be able to cancel a large write.

@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 20, 2024

To try and summarize, there are two high level code paths here which have distinct, but similar implementations

  1. General os.write used for most Windows file I/O
  2. Writing to a console _WinConsoleIO.write used for stdin, stdout, and stderr on Windows by default (although there is an escape hatch PYTHONLEGACYWINDOWSSTDIO to use os.write)

For 1, currently if isatty returns true, write size is capped. For 2, the size is always capped, and the code tries to find a utf-8 boundary to split at to avoid printing bad characters to the console. In both those cases, there is a desire for them to be cancellable (Ctrl-C in specific, which has some different technical details on Windows than general I/O Cancellation).

Additional surrounding context / history:

  • There are constants _PY_WRITE_MAX and _PY_READ_MAX to fix bpo-9015 and bpo-9611 (
    #if defined(MS_WINDOWS) || defined(__APPLE__)
    /* On Windows, the count parameter of read() is an int (bpo-9015, bpo-9611).
    On macOS 10.13, read() and write() with more than INT_MAX bytes
    fail with EINVAL (bpo-24658). */
    # define _PY_READ_MAX INT_MAX
    # define _PY_WRITE_MAX INT_MAX
    #else
    /* write() should truncate the input to PY_SSIZE_T_MAX bytes,
    but it's safer to do it ourself to have a portable behaviour */
    # define _PY_READ_MAX PY_SSIZE_T_MAX
    # define _PY_WRITE_MAX PY_SSIZE_T_MAX
    #endif
    ), that size is different than the isatty-only write size cap currently imposed on only Windows
  • Async I/O isn't supported on consoles, https://tim.mcnamara.nz/post/176613307022/iocp-and-stdio seems to go through options fairly well.

@eryksun
Copy link
Contributor

eryksun commented Jul 20, 2024

Yes and no. At its core, I/O in the Windows kernel is asynchronous. This applies to console I/O in Windows 8+. You can open "\\.\CONOUT$" in overlapped mode and queue multiple writes1. However, at the other end of the connection, the console host (i.e. "conhost.exe" or "openconsole.exe") synchronizes incoming I/O requests. If you cancel pending writes for the given file via CancelIo() or CancelIoEx(), the console host will still finish processing and displaying the first write. The behavior is the same for canceling synchronous I/O via CancelSynchronousIo(), in that the I/O request is canceled in the kernel and on the client side, but the console host will still finish processing and displaying the write. Similarly if "\\.\CONIN$" is opened asynchronously, canceling a request works as far as the I/O plumbing is concerned, but if it's a cooked read in line-input mode, the console host will remain in the read until the user presses enter, and anything they entered is simply thrown away since the associated I/O request was canceled.

The console host behaves this way -- without any real support for asynchronous I/O -- because the client-server connection was originally implemented via synchronous LPC back in Windows NT 3.1 through Windows 7 (1993-2012). Back then, console files weren't implemented as real File objects in the I/O system, They were just emulated files in the API. The heap limits that used to apply to console I/O were due to the 64k block of shared memory (managed as a heap) that was used for large LPC messages. In Windows 8+, console I/O uses real File objects instead of an LPC port, so reads and writes are only limited by available system memory.

Footnotes

  1. I'm talking about the generic I/O API functions WriteFile() and ReadFile(), and NT API NtWriteFile() and NtReadFile() -- as used to write and read encoded bytes to a console file. OTOH, the console API's I/O functions are based on an I/O control (IOCTL) routine via NtDeviceIoControlFile(), which the condrv.sys console driver only implements for a file that's opened in synchronous mode. This applies to WriteConsoleW(), ReadConsoleW(), WriteConsoleOutputW(), ReadConsoleInputW(), WriteConsoleInputW(), and ReadConsoleOutputW(). None of the latter work with a file that's opened in asynchronous mode.

@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 21, 2024

I have a general pipe dream of moving a bunch of I/O generally to io_uring on linux and I/O Completion Ports or I/O Ring on Windows, but that is definitely not a viable single step. Currently just working on simplifying things and hopefully making cases simpler, as well as faster, which makes later changes easier because there's less gotchas :).

Based on the info in this thread, what I'd like to do here is:

  1. Increase the max chunk size in the _WinConsoleIO.write case. Ideally have some measure of how many bytes would give a human response time of interactivity. I will likely just experiment with different sizes interactively/locally and see what feels best to me. If there is a more definitive source happy to defer to that.
  2. Update the os.write case to call _WinConsoleIO.write write chunking code when the write is sufficiently large + it is targeting a tty. Note that os.write still will call write not WriteConsole. I would like to remove entirely, but I don't think that is viable given how common PYTHONLEGACYWINDOWSSTDIO currently is (https://github.com/search?q=PYTHONLEGACYWINDOWSSTDIO&type=code). Combine the cases because I suspect the existing code has latent issues splitting utf-8. I would like the base size for this to be 4 times what it is from the _WinConsoleIO.write size (theory: Keep it working, but nudge projects towards the newer _WinConsoleIO and also reduce cost in the "Writing to disk" path)
  3. Update comments to focus on interactivity at windows terminal rather than that it's a bugfix (Still mention bugs so people can get context easily)

Re: PYTHONLEGACYWINDOWSSTDIO, a big portion of its use seems to derive from people interpreting the PYTHONIOENCODING docs around it to mean they have to set PYTHONLEGACYWINDOWSSTDIO to get utf-8 behavior github search, which I don't think is the case. Might work on a doc change to make it more clear https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

cmaloney added 3 commits July 25, 2024 16:57
 - Unify capping code to always respect character
   boundaries.
 - Make a soft cap rather than a hard limit
 - Make the cap for what _should_ be
   non-interactive / filesystem files 5 times that
   of interactive files (so big writes are faster)
 - Make write size for expected intractive to be
   1 MB (1024 * 1024), feels relatively responsive
   on my machines.
@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 26, 2024

Updated so write size is limited larger

  • Unify capping code to always respect character boundaries.
  • Make a soft cap rather than a hard limit
  • Make the cap for what should be non-interactive / filesystem writes 5 times that of definitely consoles to reduce overheads
  • Make write size for expected interactive to be 1 MB (1024 * 1024), feels relatively responsive on my machines.
  • Update news to reflect these changes

@cmaloney cmaloney changed the title gh-121940: Stop checking isatty on Windows write() gh-121940: Reduce checking isatty on Windows write() Jul 26, 2024
depending on heap usage). */
/* isatty is guarded because don't want it in common case of
writing DEFAULT_BUFFER_SIZE to regular files (gh-121940). */
if (count > WRITE_LIMIT_INTERACTIVE) {
if (gil_held) {
Py_BEGIN_ALLOW_THREADS
if (isatty(fd)) {
Copy link
Contributor

@eryksun eryksun Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C isatty() is true for a file opened for any character device. It excludes pipes and disk files, but it's actually true for most devices, including "NUL". I'd rather only apply this limit to console files.

_PyIO_get_console_type() could be moved from "Modules/_io/winconsoleio.c" to "Python/fileutils.c" and renamed _Py_get_console_type().

The helper function _get_console_type() could also get a minor improvement. It depends on WinAPI GetConsoleMode(), which requires read access. Unfortunately, "CON" can only be opened for writing if read access isn't requested. Fortunately, however, the device driver has to first dereference the handle to access the kernel File object, and it fails immediately if the file is opened for some other device. In this case, the error code is ERROR_INVALID_HANDLE. Otherwise, if it's a console file that simply lacks read access, the error code is ERROR_ACCESS_DENIED. We should check for the latter if GetConsoleMode() fails. For example:

char _get_console_type(HANDLE handle) {
    DWORD mode, peek_count;

    if (handle == INVALID_HANDLE_VALUE) {
        return '\0';
    }
    if (!GetConsoleMode(handle, &mode) &&
        GetLastError() != ERROR_ACCESS_DENIED) {
        return '\0';
    }
    /* Peek at the handle to see whether it is an input or output handle */
    if (GetNumberOfConsoleInputEvents(handle, &peek_count)) {
        return 'r';
    }
    return 'w';
}

I'd also prefer to first check isatty() in _Py_get_console_type(), before getting the handle and calling _get_console_type(). The isatty() call is a quick check for a valid file descriptor and the presence of the FDEV flag. If it's true, then it's worth doing the extra work to actually check whether it's a console file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that making the check a lot more complicated will increase the runtime cost more than the savings of not having a write split. This PR currently doesn't change the check that is made, just makes it less commonly called.

In general I would really like to eliminate the size + isatty check from write altogether, but I think to do that need to figure out why people are using PYTHONLEGACYWINDOWSSTDIO to avoid the newer WinConsoleIO and fix the issues underlying that. I'm potentially open to that line of work in the future, but is a much larger scope project than what this PR is focused on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to do that need to figure out why people are using PYTHONLEGACYWINDOWSSTDIO

os.write() also calls _Py_write(). It still needs Ctrl+C interrupt support, regardless of "PYTHONLEGACYWINDOWSSTDIO".

As to why someone would use multibyte (legacy) console I/O, one reason would be that it actually works correctly in some cases nowadays1, which is overall simpler and more consistent than using io._WindowsConsoleIO.

I'm concerned that making the check a lot more complicated will increase the runtime cost more than the savings of not having a write split

All we really need to check here is isatty() and whether GetConsoleMode() mode succeeds or fails with ERROR_ACCESS_DENIED. The IOCTL for GetConsoleMode() poses no perceptible cost for interactive console I/O. The appreciable relative cost would be for other character devices, such as "NUL". But it's a fixed cost of a single system call, which fails immediately before doing any real work if it's not a console file2.

Note that the CRT's read() and write() functions themselves sometimes call isatty() and GetConsoleMode(), if the file is opened in text mode3. So it's not like calling GetConsoleMode() is unprecedented in terms of determining how a read or write is handled.

Footnotes

  1. The console host finally has functional UTF-8 (codepage 65001) support for both reads and writes -- or at least "openconsole.exe" does in recent releases of Windows Terminal. The system console host, "conhost.exe", still doesn't support UTF-8 reads correctly. But since "conhost.exe" is based on the same code as "openconsole.exe", I expect it will be fixed in the next release of Windows 11 later this year.

  2. Console API functions that aren't direct I/O requests are implemented by sending an IOCTL to a console connection file, opened on "\Device\ConDrv\Connect". A handle argument, if any, gets packed in the IOCTL input data. Thus, even for a file opened on another device such as "NUL", the device-type check for GetConsoleMode(hfile, &mode) is always handled by the "condrv.sys" driver, and not the driver for the actual device of hfile, such as "null.sys".

  3. The C runtime's text mode isn't used by builtin open(), but it's the default for os.open(), unless the O_BINARY flag is used. In some cases, such as an open in UTF-16 text mode, the C runtime switches to using ReadConsoleW() and WriteConsoleW() for a console file, instead of ReadFile() and WriteFile(). Note that this is just an in principle example since the CRT's UTF-8 and UTF-16 text modes aren't compatible with Python's I/O stack. They require reading and writing an even number of UTF-16 encoded bytes, which Python's raw and buffered I/O layers don't support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun I am not okay making the changes you're requesting here.

I agree that Ctrl-C working well for all Python Windows I/O is a valuable feature. I believe this PR makes some cases better, and doesn't hurt more general I/O cases. I believe there is no change in behavior with this PR on I/O where isatty returns False other than it doesn't call isatty anymore if the write is sufficiently small (< 1,000,000 bytes). Large individual write calls to fds where isatty returns False were not split before this PR, and are not split after it. I agree those may not be really responsive to Ctrl-C. This PR keeps existing behavior when isatty would return False.

My motivation for this PR was that every write() on Windows over a relatively small threshold called isatty and that seemed like unnecessary work. Increasing the number of system calls / instructions being run by the processor to do both isatty and GetConsoleMode doesn't align with that initial motivation. I understand that it may be more precise, and that both those calls are relatively quick, but my motivation for this PR was removing an individual isatty call most the time. My personal perspective is that the fastest thing to do is less. Code still needs to meet API guarantees, and in this case splitting unnecessarily is okay and something the previous code did significantly more often. In CPython main today isatty seems to be sufficiently precise today.

When isatty returns True, there are two behavior changes with this PR:

  1. If the write is small (Under 1,000,000 bytes), isatty is never called, and the write is never split whereas in main it is always split. Testing under Windows Terminal, cmd.exe, PowerShell.exe, and the Visual Studio integrated shell on my Windows 11 box a single print() from python of a extremely long string Ctrl-C feels responsive with the 1,000,000 cutoff to me. The behavior change here is this cap is raised from 32,766 to 1,000,000 bytes.
  2. If the write is large (Over 1,000,000 bytes), isatty is called for general I/O but not for _WinConsoleIO (matching existing behavior, just the higher cap from change 1). The write() will always be capped to at most 1,000,000 bytes. The code will try to find the end of a utf-8 character by searching backwards up to 4 bytes now, which is a behavior change (Previously only _WinConsoleIO did the back search)

I'm open to a path forward but I don't currently see one I am comfortable implementing. If the changes you're requesting in this review comment are required, I'll close this PR as I'm not going to make those changes. In that case, if someone else would like to pick up these changes, adopt, and make the requested changes they're welcome to, I'm just not interested in that work.

@zooba
Copy link
Member

zooba commented Aug 6, 2024

to mean they have to set PYTHONLEGACYWINDOWSSTDIO to get utf-8 behavior

Yeah, this is the opposite of what it's for 😆 We should fix whatever docs are leading people to think this is a good or helpful idea.

It does however look like a significant number of those examples are then connecting non-console streams to the process, which means the variable is ignored and they're getting regular file objects anyway. _io.WindowsConsoleIO is really just for the interactive case, and as Eryk said, it's less important now that the default Windows console host can handle encodings properly (it may even be completely redundant with the new REPL implementation, I haven't checked).

@cmaloney
Copy link
Contributor Author

cmaloney commented Sep 9, 2024

I ran python.exe -bb -E -Wd -m test -r -w -M8g -uall test_largefile test_fileio test_io test_bufio and found ~1% speedup with this PR (full data below). Code which repeatedly reads/writes fixed size buffers likely has larger speedups. Python and PIP are looking at increasing default buffers over the 32k "must check isatty" point which will likely make more of a delta (gh-117151 and pypa/pip#12810).

If there is a path to move this PR without adding more system calls relative to main for the common case (ex. GetConsoleMode + isatty instead of just isatty), let me know what to do.

Test Details

Build: .\PCbuild\build.bat -c Release -p x64
Test: .\PCbuild\amd64\python.exe -bb -E -Wd -m test -r -w -M8g -uall test_largefile test_fileio test_io test_bufio

main: 93b61bc
isatty: this branch merged with 93b61bc / main

image
main 1 isatty 1 main 2 isatty 2
28.1 26.8 25.2 24.9
27.3 26.2 25.3 25.0
27.0 26.9 25.0 24.9
27.3 26.9 25.1 24.9
27.3 24.5 25.4 24.9
27.5 24.0 25.3 24.9
27.1 24.2 25.3 25.0
25.8 24.4 25.4 24.9
26.0 24.4 25.1 25.1
25.8 24.5 25.1 25.1

@cmaloney cmaloney closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants