-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Optimize tests (parallel execution) #3019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I will look closer in a while but a first observation: use flake8 -j0 instead of -j1. The former doesn't even start a child worker process which makes it faster than -j1. |
d099749
to
b28e8ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick notes (didn't do a full review).
mypy/waiter.py
Outdated
if not self.found_timing_file: | ||
try: | ||
with open(self.TIMING_FILENAME, 'w') as fp: | ||
json.dump(self.times, fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change to generate a pretty-printed, multi-line json file instead? It would be easier to read. For example, you could use sort_keys=True, indent=4
.
runtest_timing.json
Outdated
@@ -0,0 +1 @@ | |||
{"lint": 9.920947960765536, "run eval-test-A": 7.801836172253644, "run eval-test-B": 4.960789635451974, "run eval-test-C": 15.619698456569175, "run eval-test-D": 4.942676802394395, "run eval-test-E": 19.189689338522875, "run eval-test-F": 6.386842991035238, "run eval-test-G": 2.19908037312878, "run eval-test-H": 1.227226612685314, "run eval-test-I": 19.717739972914515, "run eval-test-L": 5.933065322821506, "run eval-test-M": 9.398179709265477, "run eval-test-N": 9.124606320054497, "run eval-test-O": 2.116921956160695, "run eval-test-P": 1.138871637819193, "run eval-test-R": 8.966922588260957, "run eval-test-S": 11.762846252672603, "run eval-test-T": 6.017162707348632, "run eval-test-U": 1.2191191366774774, "run eval-test-V": 3.01119568085889, "run eval-test-X": 0.22394048143914347, "run eval-test-Y": 0.24050901453989582, "run eval-test-s": 1.2029721354401488, "run cmdline-test": 28.51419420648503, "check file runtests.py": 1.644321132217982, "check legacy entry script": 6.834782028125318, "check legacy myunit script": 1.0219386199038922, "check fast-parse": 0.9087504009885947, "check package mypy": 8.92408001757238, "run unit-test mypy.test.testargs": 0.3363362046487737, "run unit-test mypy.test.testgraph": 0.3252464005840636, "run unit-test mypy.test.testinfer": 0.1808415150811129, "run unit-test mypy.test.testmoduleinfo": 0.08448336075068141, "run unit-test mypy.test.testparse": 0.5110368656676485, "run unit-test mypy.test.testreports": 0.3126996054114102, "run unit-test mypy.test.testsemanal": 5.180875490470299, "run unit-test mypy.test.testsolve": 0.1620294211067801, "run unit-test mypy.test.teststubgen": 0.578677914890875, "run unit-test mypy.test.testsubtypes": 0.18924212758625458, "run unit-test mypy.test.testtransform": 2.3843860963226007, "run unit-test mypy.test.testtypegen": 1.6086580353375552, "run unit-test mypy.test.testtypes": 0.29351970809608474, "run import mypy.api": 0.3575609463173066, "run import mypy.applytype": 0.1455391104136936, "run import mypy.binder": 0.14534078895593439, "run import mypy.build": 0.3166217875479447, "run import mypy.checker": 0.1709609978334754, "run import mypy.checkexpr": 0.17694975276978653, "run import mypy.checkmember": 0.1482002525240489, "run import mypy.checkstrformat": 0.1355957782023438, "run import mypy.constraints": 0.13701247291845675, "run import mypy.defaults": 0.05229128444352682, "run import mypy.erasetype": 0.1371199299632906, "run import mypy.errors": 0.06413960882474612, "run import mypy.expandtype": 0.12776721643587052, "run import mypy.experiments": 0.05509504417588573, "run import mypy.exprtotype": 0.14705812636000815, "run import mypy.fastparse": 0.14595590152512727, "run import mypy.fastparse2": 0.14839067272851025, "run import mypy.fixup": 0.13463893642810376, "run import mypy.git": 0.06504983320454016, "run import mypy.indirection": 0.1365198297754091, "run import mypy.infer": 0.14904134093748667, "run import mypy.join": 0.1473943246877809, "run import mypy.main": 0.33474449717215293, "run import mypy.maptype": 0.13427429358844734, "run import mypy.meet": 0.14524755416704238, "run import mypy.messages": 0.1365549903525789, "run import mypy.moduleinfo": 0.05397069583179359, "run import mypy.nodes": 0.12733896850718907, "run import mypy.options": 0.06343955778265808, "run import mypy.parse": 0.1272931412380558, "run import mypy.report": 0.29526707026269605, "run import mypy.sametypes": 0.13551558048135348, "run import mypy.semanal": 0.19040361182089782, "run import mypy.sharedparse": 0.06393338611371746, "run import mypy.solve": 0.15974161321469182, "run import mypy.stats": 0.13793494424086816, "run import mypy.strconv": 0.13846630352506395, "run import mypy.stubgen": 0.33819299917351486, "run import mypy.stubgenc": 0.05659865267827513, "run import mypy.stubutil": 0.06459867164130628, "run import mypy.subtypes": 0.1582281281456801, "run import mypy.traverser": 0.1380961298081047, "run import mypy.treetransform": 0.13880052653954067, "run import mypy.typeanal": 0.15936630368310034, "run import mypy.typefixture": 0.14723353418318652, "run import mypy.types": 0.13597859392456257, "run import mypy.typevars": 0.13790294416500615, "run import mypy.util": 0.11614249752295791, "run import mypy.version": 0.2009079330163388, "run import mypy.visitor": 0.05279380415322521, "run import mypy.waiter": 0.07502911611840091, "run import mypy": 0.04313728743653655, "run import mypy.myunit": 0.07410032379337395, "run import mypy.test.collect": 0.04308316385146327, "run import mypy.test.config": 0.055242402549879444, "run import mypy.test.data": 0.1498188242618994, "run import mypy.test.helpers": 0.1486087473194857, "run import mypy.test.testargs": 0.3352596588870824, "run import mypy.test.testcheck": 0.3891414656192751, "run import mypy.test.testcmdline": 0.28382882092756745, "run import mypy.test.testextensions": 0.08309313523309925, "run import mypy.test.testgraph": 0.33439249633778445, "run import mypy.test.testinfer": 0.18007864907531257, "run import mypy.test.testmoduleinfo": 0.0765414159994009, "run import mypy.test.testparse": 0.21122617969663793, "run import mypy.test.testpythoneval": 0.19029654983873456, "run import mypy.test.testreports": 0.3122863698639833, "run import mypy.test.testsemanal": 0.39830770956891115, "run import mypy.test.testsolve": 0.15927543927017496, "run import mypy.test.teststubgen": 0.4045279712267984, "run import mypy.test.testsubtypes": 0.14806632628068428, "run import mypy.test.testtransform": 0.39037880188556073, "run import mypy.test.testtypegen": 0.3912929768919753, "run import mypy.test.testtypes": 0.15004045441688163, "run import mypy.test.update": 0.04323526297741864, "run import mypy.test": 0.042489779728867916, "check stubs": 5.7442364307826494, "check stdlibsamples (3.2)": 5.183215841696807, "check file test-data\\samples\\bottles.py": 0.8908374696393935, "check file test-data\\samples\\class.py": 0.9207077626653017, "check file test-data\\samples\\cmdline.py": 0.9078413617968408, "check file test-data\\samples\\crawl.py": 1.6454806411392724, "check file test-data\\samples\\crawl2.py": 1.6100411497271807, "check file test-data\\samples\\dict.py": 0.9536208283358292, "check file test-data\\samples\\fib.py": 0.9493213613543503, "check file test-data\\samples\\files.py": 0.9579119990012828, "check file test-data\\samples\\for.py": 0.916309925080327, "check file test-data\\samples\\generators.py": 0.9264290601774974, "check file test-data\\samples\\greet.py": 0.9175915083650921, "check file test-data\\samples\\guess.py": 0.9301059083991845, "check file test-data\\samples\\hello.py": 0.9109362827131804, "check file test-data\\samples\\input.py": 0.9187209925238449, "check file test-data\\samples\\itertool.py": 0.9060019500292924, "check file test-data\\samples\\regexp.py": 0.9332988048564062} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a file name starting with .
for this, since it usually won't need to be manipulated manually? Also can you regenerate to use pretty-printed json (as discussed above).
Also, Travis CI is now ~20% slower than before; do we know the platform it's running on? Edit: never mind... it's just fluctuating up and down randomly. |
Currently, the timing file is only generated if it's not already present. So we need to delete it once in a while, as new tests (especially long ones) are added to the suite. Also, it's best to run |
It might be overkill, but we could store a list of the last N timings instead of just a single timing, then use an average of those timings to decide on ordering. That way random fluctuations don't influence ordering as much. Another idea is to record which tests have failed recently and run them first. That way, you're more likely to know quickly whether you fixed a test that was previously broken. |
@JelleZijlstra I actually measured the times using a I do like the idea of running last-failed tests first. |
I changed the logic a bit. Now the only way to write (or overwrite) the timing file is to execute |
I don't like the workflow with I've just pushed a new commit for that (d7a436f). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JukkaL, @gvanrossum LGTM.
@@ -161,12 +189,14 @@ def _record_time(self, name: str, elapsed_time: float) -> None: | |||
|
|||
def _poll_current(self) -> Tuple[int, int]: | |||
while True: | |||
time.sleep(.05) | |||
time.sleep(.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance improvements by trimming the slowdown loops!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ambv Actually, I'm only hoping so. On the positive side, with ~140 tasks, we're cutting down on the average delay of 0.05 per task, or ~7 sec for the current test suite (well, for slower tasks, it's actually half of that, since on average they finish in the middle of the wait interval). Spread across (say) 4 workers, it reduces total wall time by a quarter of that, or ~2 sec.
But OTOH, reducing the wait time also increases the number of context switches to the main process and the CPU load of the main process (i.e., the process that is waiting for the next task to finish); both would cause slowdown in the workers. I hope it's not enough to become noticeable at 10 ms wait time, given that a context switch is just a few microseconds, and the runtime of the loop seems to be pretty short too. But I'm too lazy to test it.
mypy/waiter.py
Outdated
except FileNotFoundError: | ||
test_log = [] | ||
except json.JSONDecodeError: | ||
print('corrupt test log file {}'.format(self.FULL_LOG_FILENAME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: file=sys.stderr
# we don't know how long a new task takes | ||
# better err by putting it in front in case it is slow: | ||
# a fast task in front hurts performance less than a slow task in the back | ||
return float('inf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
But avg()
doesn't need to be a closure within run()
, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, should I move it out to module level? I just wasn't sure about polluting module namespace with a somewhat customized averaging function (after all, it does ignore None
and returns infinity when list is empty - which isn't a standard definition of average
).
mypy/waiter.py
Outdated
# => order: seq failed, parallel failed, parallel passed, seq passed | ||
# => among failed tasks, sequential should go before parallel | ||
# => among successful tasks, sequential should go after parallel | ||
sequential = -sequential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorial nit: what do the =>
arrows signify? Did you mean an unordered list? Adding punctuation at the end of an element also improves reading flow. Compare:
avoid interrupting parallel tasks with sequential in between
- order: seq failed, parallel failed, parallel passed, seq passed;
- among failed tasks, sequential should go before parallel; and
- among successful tasks, sequential should go after parallel.
I added a few features to runtests.py: * -p argument to pass arguments to pytest * move pytest to the front, run before anything else * combine all pytest tests into one shell command * schedule longer tasks first
Pushed manually as 5bf9d18. Thanks for the PR! This is going to help us migrate more tests to pytest and generally improves the local testing workflow. |
I added a few features to
runtests.py
:This PR is not ready for merging, I just wanted someone to start looking at it, and maybe try on their machine to see how fast the tests run.