Skip to content

bpo-36670, regrtest: Fix WindowsLoadTracker() for partial line #16550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 2, 2019
Merged

bpo-36670, regrtest: Fix WindowsLoadTracker() for partial line #16550

merged 1 commit into from
Oct 2, 2019

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 2, 2019

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.

https://bugs.python.org/issue36670

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.
@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2019

I managed to reproduce the bug https://bugs.python.org/issue36670#msg353768 using this change:

diff --git a/Lib/test/libregrtest/win_utils.py b/Lib/test/libregrtest/win_utils.py
index f0c17b906f..78429faa89 100644
--- a/Lib/test/libregrtest/win_utils.py
+++ b/Lib/test/libregrtest/win_utils.py
@@ -14,7 +14,7 @@ BUFSIZE = 8192
 LOAD_FACTOR_1 = 0.9200444146293232478931553241
 
 # Seconds per measurement
-SAMPLING_INTERVAL = 5
+SAMPLING_INTERVAL = 0
 # Windows registry subkey of HKEY_LOCAL_MACHINE where the counter names
 # of typeperf are registered
 COUNTER_REGISTRY_KEY = (r"SOFTWARE\Microsoft\Windows NT\CurrentVersion"

I tested manually that this change fix the bug.

@db3l
Copy link
Contributor

db3l commented Oct 2, 2019

Perhaps it's worth a comment in the code for future readers that due to when typeperf generates the new line (before each interval's output), waiting for it will always delay the current value by the sampling interval?

@vstinner vstinner merged commit 3e04cd2 into python:master Oct 2, 2019
@vstinner vstinner deleted the typeperf_fix branch October 2, 2019 23:04
@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2019

@db3l: "Perhaps it's worth a comment in the code for future readers that due to when typeperf generates the new line (before each interval's output), waiting for it will always delay the current value by the sampling interval?"

Sorry, I didn't understand your comment.

"waiting for it": do you mean calling ReadFile() again would block?

My change fixed the issue for me. So I merged the PR. I understand that previously, the code dropped some samples depending if the line was fully written into the pipe or not.

@db3l
Copy link
Contributor

db3l commented Oct 2, 2019

No, I was just referring to the timing of processing each sample. Your change should definitely fix the original issue of reading a partial sample line, but it now also delays each sample by the sampling interval.

That is, in my testing, typeperf generates output at a given sampling interval of:
CRLF sample data (sampling delay)
CRLF sample data (sampling delay)
etc...

E.g., it outputs the newline at the beginning of each sample. So by waiting for the LF, you won't return the line with sample 'n' until sample 'n+1' starts being generated, thus introducing a sample time delay that wasn't there before.

Hardly a major issue (particularly with the current 5s sampling interval). But if I were just reading the code, I'd probably guess the CRLF immediately followed each sample, and wouldn't expect waiting for a complete line to introduce much latency. So it seemed potentially comment-worthy.

@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2019

Oh, I see. I understand that typeperf calls write() 3 times:

write("\r\n")
write(timestamp)
write(value)

But there are also special cases for blank lines at header line at the beginning... Oh... I didn't expect that it would be so complex to parse a command output :-(

I see two options:

  • do nothing and keep the current sub-optimal code which "works" but with a delay of 5 seconds
  • experimental smarter workaround for incomplete lines: need to detect once we saw all header and blank lines, and then ensure that each line looks complete (contains ","? use a regex?)

I suggest to leave the code as it is. It's only used for debug purpose and it's only used in regrtest.

@db3l: If you come up with a comment, I can add it to the code :-)

@db3l
Copy link
Contributor

db3l commented Oct 3, 2019

Without access to typeperf's source we're just guessing, but yes, that's how I expect it's working (I suppose the initial "\r\n" might be combined with the timestamp into a single write() call at the start of each sample). As for blank lines this change actually generates more of them (one for every sample) due to the CRLF being at the front of the buffer, but they just get skipped.

I do think there's a third choice, which is just the older behavior (ignore parse issues without a warning). It feels wrong knowing there's a race condition, but it introduces no additional latency and ignoring the partial reads doesn't break anything. That's what's been happening prior to this point anyway.

However, for properly handling partial lines, your approach seems the simplest aside from the latency question (which seems minor as long as the sampling interval remains small). Trying to identify complete lines other than via newline feels like it would be fragile, or need to involve its own latency anyway, such as a timeout.

As for a comment, maybe just after your self._buffer comment, something like:

"typeperf generates CRLF before each sample, rather than after, so this will introduce a SAMPLING_INTERVAL delay"

@vstinner
Copy link
Member Author

vstinner commented Oct 3, 2019

experimental smarter workaround for incomplete lines: need to detect once we saw all header and blank lines, and then ensure that each line looks complete (contains ","? use a regex?)

Well, I found a simple solution: parse the last line. If it's valid, pass it the parser. Otherwise, put it in a buffer... No regex, no parser state. See my PR #16553.

vstinner added a commit that referenced this pull request Oct 3, 2019
…H-16550) (GH-16560)

* bpo-36670, regrtest: Fix WindowsLoadTracker() for partial line (GH-16550)

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.

(cherry picked from commit 3e04cd2)

* bpo-36670: Enhance regrtest WindowsLoadTracker (GH-16553)

The last line is now passed to the parser even if it does not end
with a newline, but only if it's a valid value.

(cherry picked from commit c65119d)

* bpo-36670: Enhance regrtest (GH-16556)

* Add log() method: add timestamp and load average prefixes
  to main messages.
* WindowsLoadTracker:

  * LOAD_FACTOR_1 is now computed using SAMPLING_INTERVAL
  * Initialize the load to the arithmetic mean of the first 5 values
    of the Processor Queue Length value (so over 5 seconds), rather
    than 0.0.
  * Handle BrokenPipeError and when typeperf exit.

* format_duration(1.5) now returns '1.5 sec', rather than
  '1 sec 500 ms'

(cherry picked from commit 098e256)
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 3, 2019
…ythonGH-16550) (pythonGH-16560)

* bpo-36670, regrtest: Fix WindowsLoadTracker() for partial line (pythonGH-16550)

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.

(cherry picked from commit 3e04cd2)

* bpo-36670: Enhance regrtest WindowsLoadTracker (pythonGH-16553)

The last line is now passed to the parser even if it does not end
with a newline, but only if it's a valid value.

(cherry picked from commit c65119d)

* bpo-36670: Enhance regrtest (pythonGH-16556)

* Add log() method: add timestamp and load average prefixes
  to main messages.
* WindowsLoadTracker:

  * LOAD_FACTOR_1 is now computed using SAMPLING_INTERVAL
  * Initialize the load to the arithmetic mean of the first 5 values
    of the Processor Queue Length value (so over 5 seconds), rather
    than 0.0.
  * Handle BrokenPipeError and when typeperf exit.

* format_duration(1.5) now returns '1.5 sec', rather than
  '1 sec 500 ms'

(cherry picked from commit 098e256)
(cherry picked from commit de3195c)

Co-authored-by: Victor Stinner <vstinner@python.org>
miss-islington added a commit that referenced this pull request Oct 3, 2019
…H-16550) (GH-16560)

* bpo-36670, regrtest: Fix WindowsLoadTracker() for partial line (GH-16550)

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.

(cherry picked from commit 3e04cd2)

* bpo-36670: Enhance regrtest WindowsLoadTracker (GH-16553)

The last line is now passed to the parser even if it does not end
with a newline, but only if it's a valid value.

(cherry picked from commit c65119d)

* bpo-36670: Enhance regrtest (GH-16556)

* Add log() method: add timestamp and load average prefixes
  to main messages.
* WindowsLoadTracker:

  * LOAD_FACTOR_1 is now computed using SAMPLING_INTERVAL
  * Initialize the load to the arithmetic mean of the first 5 values
    of the Processor Queue Length value (so over 5 seconds), rather
    than 0.0.
  * Handle BrokenPipeError and when typeperf exit.

* format_duration(1.5) now returns '1.5 sec', rather than
  '1 sec 500 ms'

(cherry picked from commit 098e256)
(cherry picked from commit de3195c)

Co-authored-by: Victor Stinner <vstinner@python.org>
jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
…nGH-16550)

WindowsLoadTracker.read_output() now uses a short buffer for
incomplete line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants