-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-120754: Add a strace helper and test set of syscalls for open().read() #121143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-120754: Add a strace helper and test set of syscalls for open().read() #121143
Conversation
The test explicitly wants to check for `vfork` call _or_ flag passed to clone so keep doing the regex search rather than looking at particular events
Goal is that as changes to I/O code are made changes to the set of system calls made need to be intentional so that common fast-paths aren't accidentally changed.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Running on my dev machine, ArchLinux, get fstat while Ubuntu CI machine finds newfstatat
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments, but please don't be discouraged. I like what this is trying to do overall.
Lib/test/test_fileio.py
Outdated
# system using a variant, allow that | ||
fstat = next((sc for sc in calls if 'fstat' in sc), 'fstat') | ||
|
||
readall_calls = ['openat', fstat, 'ioctl', 'lseek', 'read', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly testing system calls against this list is too specific, this is just a recorded existing behavior of an existing implementation the insides of which are somewhat beyond our control (these come from an underlying libc kernel interface). we don't want to guarantee that these stay the same.
assertEqual against a strict list isn't what is desired, it will be a maintenance burden. Make this a less fragile, more specific, regression test for specific behaviors. Such as not having multiple fstat calls, or having at most two read calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a a potentially fragile point, and lots could lead to a test failure even though the cpython implementation is as desired (ex. cpython _io
, cpython _pyio
, glibc, and musl differ around Buffered I/O). There are a lot of system calls which could happen as part of the Python standard file opening process which I want to validate don't ideally. Just saying "one stat", "two reads" "no ioctl" doesn't quite feel like it could help catch these changes.
Motivating examples I would like to be able to write behavior tests around:
- CPython's
open()
builtin currently callsisatty
to check if the file it just opened is "interactive" to determine buffered mode. If buffered, aBufferedIO
is constructed which additionally wants to have an accurate position inside the file, and callslseek
, sometimes at__init__
(inside open), but sometimes when other I/O operations happen (read
,write
). The buffering behavior is visible, and documented, around stdout (https://docs.python.org/3/library/sys.html#sys.stdout) with a special environment variable to change itPYTHONUNBUFFERED
.stderr
has a different behavior than stdout. For this PR, I needed to do aflush=True
inprint()
because the testpython
has its output redirected to a file which changes the behavior of the standard library making there be onewrite()
system call at interpreter shutdown instead of calls when theprint()
statements are run. Theisatty
python check results in anioctl
system call (full stack at [0]). - gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems #121315 in the initial implementation added a
fstat
to everyread
call to test "is this a pipe" (108e65b). That would fairly substantially reduce read performance. If I just test forat most one fstat
over lifetime, could catch. But what if it was an attribute whichioctl
is used to get? The performance slowdown was happening because of lots large allocation + de-allocation (Limit the reading size from pipes to their default buffer size on Unix systems #121313), which isn't a FD system call.
In some ways this feels like it interacts a bit with "what is a supported platform" (https://peps.python.org/pep-0011/). I can defer for now and count specific system calls (1 fstat over life of file, no ioctls, etc.) but worry it leaves some substantial gaps. I'm really not sure what the right set of tradeoffs is for long term maintenance + platform porting and needing to update tests like this one.
[0]
ioctl(TCGETS)
(linux)tcgettattr
(glibc)isatty
(glibc)io.isatty
(cpython)FileIO.isatty
(cpython)open()
(cpython)
Lib/test/test_fileio.py
Outdated
|
||
It's expected as bits of the I/O implementation change, this will need | ||
to change. The goal is to catch changes that unintentionally add | ||
additional systemcalls (ex. additional fstat calls has been an issue). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include a link to the specific gh cpython issue where additional calls were a problem.
Lib/test/test_subprocess.py
Outdated
# Do NOT check that fork() or other clones did not happen. | ||
# If the OS denys the vfork it'll fallback to plain fork(). | ||
|
||
# Test that each individual thing that would disable the use of vfork | ||
# actually disables it. | ||
for sub_name, preamble, sp_kwarg, expect_permission_error in ( | ||
("!use_vfork", "subprocess._USE_VFORK = False", "", False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems wrong to remove this subTest as part of this PR which is focused on making the strace test helper concept generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, order of operations vs. pr builds finishing / passing. I suspect this PR will take longer than the _USE_VFORK
removal, but will separate out if we get close to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gh-121383 / _USE_VFORK removal has been merged
Lib/test/support/strace_helper.py
Outdated
return unittest.skip("Linux only, requires strace.") | ||
# Moderately expensive (spawns a subprocess), so share results when possible. | ||
if _strace_working is None: | ||
_can_strace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the assignment to the global _strace_working into this function so that the is None test and code assigning it to something other than that all lives together in one scope. (have _can_strace() return a boolean to be assigned perhaps)
Lib/test/support/strace_helper.py
Outdated
class StraceResult: | ||
strace_returncode: int | ||
python_returncode: int | ||
_raw_events: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gets used, so perhaps don't make it private. Add a comment describing what exactly it is for. for the most part it looks like it'll be the same as stderr.
Lib/test/support/strace_helper.py
Outdated
strace_returncode: int | ||
python_returncode: int | ||
_raw_events: str | ||
stdout: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use str
for child process output. The process emits binary output and no codec is declared or guaranteed.
It is okay to decode names once parsed out for high level use in StraceEvent.syscall, but the process itself shouldn't be expected to produce output matching any encoding.
Lib/test/support/strace_helper.py
Outdated
res.fail(cmd_line) | ||
|
||
# Get out program returncode | ||
decoded = res.err.decode().strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) Always specify a codec on decode. BUT... (2) Instead of decoding here, it'd be better to leave output as bytes. There will be processes that do not emit valid encoded data.
Lib/test/support/strace_helper.py
Outdated
from dataclasses import dataclass | ||
from test import support | ||
from test.support.script_helper import run_python_until_end | ||
from typing import Dict, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - modern typing does not need these two. use dict[] and list[] directly. (but read below, we aren't ready to accept non-trivial annotations in test.support)
Lib/test/support/strace_helper.py
Outdated
match["returncode"]) for match in matches if match | ||
] | ||
|
||
def sections(self) -> Dict[str:List[StraceEvent]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a valid annotation. Use -> dict[str, list[StraceEvent]]
... But it'd better to just not annotate this file at all. Mention what gets returned in doc strings instead.
BUT This is a good example of why we generally do not have type annotations anywhere within CPython. We are not setup to run multiple type checkers over the stdlib to validate them. You can see what little is configured in .github/workflows/mypy.yml - but do not try to use this PR to enable it for test.support. it isn't ready.
PEP 649 being implemented is the only reason this was not a Python syntax error.
Lib/test/test_subprocess.py
Outdated
) | ||
# Match both vfork() and clone(..., flags=...|CLONE_VFORK|...) | ||
self.assertRegex(vfork_result.err, br"(?i)vfork") | ||
self.assertRegex(vfork_result._raw_events, r"(?i)vfork") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ._raw_events rather than .stderr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My theory was trying to encourage people to use the parsed results / .events
for their test cases. It's useful to have the underlying un-parsed strace output (ex. vfork anywhere in args or function names should pass) but cases like #121315 mainly just care "these calls don't happen in this case", and for mine the overall set of calls are roughly what is expected.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the tests! I think gpshead's feedback is addressed?
(re #121143 (comment) , I think it's nice to have fairly specific tests. Maybe we could only run those if platform.libc_ver()[0] == glibc or something, personally I'm happy to risk a false positive until the test is proven fragile)
return [ev.syscall for ev in events] | ||
|
||
|
||
def _can_strace(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i would just stick a functools.cache on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to functools.cache, definitely simplified a bit. Initially had been worried about adding a lot more dependencies to the I/O tests, but also realized when things like read
are broken, interpreter build breaks (can't load modules, etc.)
Yea, I think I've addressed all of @gpshead's review comments, I haven't clicked 'resolved conversation' though to keep the review visible for context |
Thank you again! (gpshead, if you come back to this and want some change, just let me know) |
|
… for open().read() (python#121143)" This reverts commit e38d0af.
Hmm, I'd need a few more logs to figure this out. Let's revert for now #123303 and fix in another PR? My bad, I should have thought to run the buildbots on this :-) |
Sounds good to me. Think i have all the bits to try that raspbian locally, need to figure out what its syscall name for fstat looks like |
Let me go strace on my Raspbian bot to see what it says. This type of test is also okay to just restrict to a known set of platforms if it comes down to that. |
Okay, I changed the new
|
|
NOTE: This needs a full buildbot test pass before merge, see: python#121143 (comment). 1. Added `statx` to set of allowed syscall forms (Should make Raspian bot pass). 2. Check that the `fd` returned from `open` is passed to all future calls. This helps ensure things like the `stat` call uses the file descriptor rather than the `filename` to avoid TOCTOU isuses. 3. Update the `Path().read_bytes()` test case to additionally validate the reduction in`isatty`/`ioctl` + `seek` calls from python#122111 4. Better diagnostic assertion messagess from @gpshead, so when the test fails have first information immediately available. Makes remote CI debugging much simpler.
General goal here is to be able to add tests that behavior isn't unintentionally changed around some common python code patterns, in particular Path("README.rst").read_text().
This started with the code that validated
subprocess
used vfork, building more general utilities from that core code. The work was motivated in #120755 for how to test that particular change.TODO
Figure out why the-- This is the documented behavior of stdout https://docs.python.org/3/library/sys.html#sys.stdout. Because they're files / not interactive it gets block buffered meaning only once a buffer is full does it flush rather than write() when a print() happens.print()
of markers needs flush=True to not buffer them. Likely add a test for two print statements not being merged using this infrastructure.Validate at least once CI bot actually runs and passes the test-- Ubuntu / Build and test found that glibc there uses newfstat which caused the test to fail. Updated the test to allow libc to use other fstat variants.