Optimize tests (parallel execution) #3019

pkch · 2017-03-18T02:05:39Z

I added a few features to runtests.py:

-p argument to pass arguments to pytest
move pytest to the front, run before anything else
combine all pytest tests into one shell command
schedule longer tasks first

This PR is not ready for merging, I just wanted someone to start looking at it, and maybe try on their machine to see how fast the tests run.

…ther tasks; flake8 -j1

ambv · 2017-03-18T07:06:25Z

I will look closer in a while but a first observation: use flake8 -j0 instead of -j1. The former doesn't even start a child worker process which makes it faster than -j1.

JukkaL

A few quick notes (didn't do a full review).

JukkaL · 2017-03-18T22:59:39Z

mypy/waiter.py

+        if not self.found_timing_file:
+            try:
+                with open(self.TIMING_FILENAME, 'w') as fp:
+                    json.dump(self.times, fp)


Can you change to generate a pretty-printed, multi-line json file instead? It would be easier to read. For example, you could use sort_keys=True, indent=4.

JukkaL · 2017-03-18T23:13:19Z

runtest_timing.json

@@ -0,0 +1 @@
+{"lint": 9.920947960765536, "run eval-test-A": 7.801836172253644, "run eval-test-B": 4.960789635451974, "run eval-test-C": 15.619698456569175, "run eval-test-D": 4.942676802394395, "run eval-test-E": 19.189689338522875, "run eval-test-F": 6.386842991035238, "run eval-test-G": 2.19908037312878, "run eval-test-H": 1.227226612685314, "run eval-test-I": 19.717739972914515, "run eval-test-L": 5.933065322821506, "run eval-test-M": 9.398179709265477, "run eval-test-N": 9.124606320054497, "run eval-test-O": 2.116921956160695, "run eval-test-P": 1.138871637819193, "run eval-test-R": 8.966922588260957, "run eval-test-S": 11.762846252672603, "run eval-test-T": 6.017162707348632, "run eval-test-U": 1.2191191366774774, "run eval-test-V": 3.01119568085889, "run eval-test-X": 0.22394048143914347, "run eval-test-Y": 0.24050901453989582, "run eval-test-s": 1.2029721354401488, "run cmdline-test": 28.51419420648503, "check file runtests.py": 1.644321132217982, "check legacy entry script": 6.834782028125318, "check legacy myunit script": 1.0219386199038922, "check fast-parse": 0.9087504009885947, "check package mypy": 8.92408001757238, "run unit-test mypy.test.testargs": 0.3363362046487737, "run unit-test mypy.test.testgraph": 0.3252464005840636, "run unit-test mypy.test.testinfer": 0.1808415150811129, "run unit-test mypy.test.testmoduleinfo": 0.08448336075068141, "run unit-test mypy.test.testparse": 0.5110368656676485, "run unit-test mypy.test.testreports": 0.3126996054114102, "run unit-test mypy.test.testsemanal": 5.180875490470299, "run unit-test mypy.test.testsolve": 0.1620294211067801, "run unit-test mypy.test.teststubgen": 0.578677914890875, "run unit-test mypy.test.testsubtypes": 0.18924212758625458, "run unit-test mypy.test.testtransform": 2.3843860963226007, "run unit-test mypy.test.testtypegen": 1.6086580353375552, "run unit-test mypy.test.testtypes": 0.29351970809608474, "run import mypy.api": 0.3575609463173066, "run import mypy.applytype": 0.1455391104136936, "run import mypy.binder": 0.14534078895593439, "run import mypy.build": 0.3166217875479447, "run import mypy.checker": 0.1709609978334754, "run import mypy.checkexpr": 0.17694975276978653, "run import mypy.checkmember": 0.1482002525240489, "run import mypy.checkstrformat": 0.1355957782023438, "run import mypy.constraints": 0.13701247291845675, "run import mypy.defaults": 0.05229128444352682, "run import mypy.erasetype": 0.1371199299632906, "run import mypy.errors": 0.06413960882474612, "run import mypy.expandtype": 0.12776721643587052, "run import mypy.experiments": 0.05509504417588573, "run import mypy.exprtotype": 0.14705812636000815, "run import mypy.fastparse": 0.14595590152512727, "run import mypy.fastparse2": 0.14839067272851025, "run import mypy.fixup": 0.13463893642810376, "run import mypy.git": 0.06504983320454016, "run import mypy.indirection": 0.1365198297754091, "run import mypy.infer": 0.14904134093748667, "run import mypy.join": 0.1473943246877809, "run import mypy.main": 0.33474449717215293, "run import mypy.maptype": 0.13427429358844734, "run import mypy.meet": 0.14524755416704238, "run import mypy.messages": 0.1365549903525789, "run import mypy.moduleinfo": 0.05397069583179359, "run import mypy.nodes": 0.12733896850718907, "run import mypy.options": 0.06343955778265808, "run import mypy.parse": 0.1272931412380558, "run import mypy.report": 0.29526707026269605, "run import mypy.sametypes": 0.13551558048135348, "run import mypy.semanal": 0.19040361182089782, "run import mypy.sharedparse": 0.06393338611371746, "run import mypy.solve": 0.15974161321469182, "run import mypy.stats": 0.13793494424086816, "run import mypy.strconv": 0.13846630352506395, "run import mypy.stubgen": 0.33819299917351486, "run import mypy.stubgenc": 0.05659865267827513, "run import mypy.stubutil": 0.06459867164130628, "run import mypy.subtypes": 0.1582281281456801, "run import mypy.traverser": 0.1380961298081047, "run import mypy.treetransform": 0.13880052653954067, "run import mypy.typeanal": 0.15936630368310034, "run import mypy.typefixture": 0.14723353418318652, "run import mypy.types": 0.13597859392456257, "run import mypy.typevars": 0.13790294416500615, "run import mypy.util": 0.11614249752295791, "run import mypy.version": 0.2009079330163388, "run import mypy.visitor": 0.05279380415322521, "run import mypy.waiter": 0.07502911611840091, "run import mypy": 0.04313728743653655, "run import mypy.myunit": 0.07410032379337395, "run import mypy.test.collect": 0.04308316385146327, "run import mypy.test.config": 0.055242402549879444, "run import mypy.test.data": 0.1498188242618994, "run import mypy.test.helpers": 0.1486087473194857, "run import mypy.test.testargs": 0.3352596588870824, "run import mypy.test.testcheck": 0.3891414656192751, "run import mypy.test.testcmdline": 0.28382882092756745, "run import mypy.test.testextensions": 0.08309313523309925, "run import mypy.test.testgraph": 0.33439249633778445, "run import mypy.test.testinfer": 0.18007864907531257, "run import mypy.test.testmoduleinfo": 0.0765414159994009, "run import mypy.test.testparse": 0.21122617969663793, "run import mypy.test.testpythoneval": 0.19029654983873456, "run import mypy.test.testreports": 0.3122863698639833, "run import mypy.test.testsemanal": 0.39830770956891115, "run import mypy.test.testsolve": 0.15927543927017496, "run import mypy.test.teststubgen": 0.4045279712267984, "run import mypy.test.testsubtypes": 0.14806632628068428, "run import mypy.test.testtransform": 0.39037880188556073, "run import mypy.test.testtypegen": 0.3912929768919753, "run import mypy.test.testtypes": 0.15004045441688163, "run import mypy.test.update": 0.04323526297741864, "run import mypy.test": 0.042489779728867916, "check stubs": 5.7442364307826494, "check stdlibsamples (3.2)": 5.183215841696807, "check file test-data\\samples\\bottles.py": 0.8908374696393935, "check file test-data\\samples\\class.py": 0.9207077626653017, "check file test-data\\samples\\cmdline.py": 0.9078413617968408, "check file test-data\\samples\\crawl.py": 1.6454806411392724, "check file test-data\\samples\\crawl2.py": 1.6100411497271807, "check file test-data\\samples\\dict.py": 0.9536208283358292, "check file test-data\\samples\\fib.py": 0.9493213613543503, "check file test-data\\samples\\files.py": 0.9579119990012828, "check file test-data\\samples\\for.py": 0.916309925080327, "check file test-data\\samples\\generators.py": 0.9264290601774974, "check file test-data\\samples\\greet.py": 0.9175915083650921, "check file test-data\\samples\\guess.py": 0.9301059083991845, "check file test-data\\samples\\hello.py": 0.9109362827131804, "check file test-data\\samples\\input.py": 0.9187209925238449, "check file test-data\\samples\\itertool.py": 0.9060019500292924, "check file test-data\\samples\\regexp.py": 0.9332988048564062}


Maybe use a file name starting with . for this, since it usually won't need to be manipulated manually? Also can you regenerate to use pretty-printed json (as discussed above).

pkch · 2017-03-19T01:21:50Z

Also, Travis CI is now ~20% slower than before; do we know the platform it's running on?

Edit: never mind... it's just fluctuating up and down randomly.

pkch · 2017-03-20T08:52:47Z

Currently, the timing file is only generated if it's not already present. So we need to delete it once in a while, as new tests (especially long ones) are added to the suite. Also, it's best to run -j1 when recreating the file, since the wall time then better corresponds to actual CPU time.

JelleZijlstra · 2017-03-20T17:18:31Z

It might be overkill, but we could store a list of the last N timings instead of just a single timing, then use an average of those timings to decide on ordering. That way random fluctuations don't influence ordering as much.

Another idea is to record which tests have failed recently and run them first. That way, you're more likely to know quickly whether you fixed a test that was previously broken.

pkch · 2017-03-20T22:58:38Z

@JelleZijlstra I actually measured the times using a -j1 run because with parallel execution wall times are way too variable (primarily because of the interference of other processes on the user machine). I'd say with -j1, the variability is too small to worry about measuring average of the last several timings, but I can do that.

I do like the idea of running last-failed tests first. pytest can already do that with -ff option (it uses its own cache). I could add this functionality to runtests.py and combine it with pytest in a reasonable way. But I think I should make a separate PR, since its only tangentially related to this one, and I want to either approve or reject this PR first to avoid merge hell.

pkch · 2017-03-21T01:28:33Z

I changed the logic a bit. Now the only way to write (or overwrite) the timing file is to execute python runtests.py -r; a warning to do so is issued whenever the file doesn't exist, or whenever there are long tasks that aren't included in the existing file.

pkch · 2017-03-23T02:43:30Z

I don't like the workflow with -r any more. It requires someone to think about this, and maybe even read the docs (which I'll then have to write). I want to go back to doing everything implicitly. And I've incorporated @JelleZijlstra suggestion to use several last test times, and to add an option to run last failed tests first or only failed tests (using the same options as in pytest: --lf and --ff).

I've just pushed a new commit for that (d7a436f).

ambv

@JukkaL, @gvanrossum LGTM.

ambv · 2017-03-25T23:30:25Z

mypy/waiter.py

@@ -161,12 +189,14 @@ def _record_time(self, name: str, elapsed_time: float) -> None:

    def _poll_current(self) -> Tuple[int, int]:
        while True:
-            time.sleep(.05)
+            time.sleep(.01)


Performance improvements by trimming the slowdown loops!

@ambv Actually, I'm only hoping so. On the positive side, with ~140 tasks, we're cutting down on the average delay of 0.05 per task, or ~7 sec for the current test suite (well, for slower tasks, it's actually half of that, since on average they finish in the middle of the wait interval). Spread across (say) 4 workers, it reduces total wall time by a quarter of that, or ~2 sec.

But OTOH, reducing the wait time also increases the number of context switches to the main process and the CPU load of the main process (i.e., the process that is waiting for the next task to finish); both would cause slowdown in the workers. I hope it's not enough to become noticeable at 10 ms wait time, given that a context switch is just a few microseconds, and the runtime of the loop seems to be pretty short too. But I'm too lazy to test it.

ambv · 2017-03-25T23:32:23Z

mypy/waiter.py

+        except FileNotFoundError:
+            test_log = []
+        except json.JSONDecodeError:
+            print('corrupt test log file {}'.format(self.FULL_LOG_FILENAME))


nit: file=sys.stderr

ambv · 2017-03-25T23:33:29Z

mypy/waiter.py

+                # we don't know how long a new task takes
+                # better err by putting it in front in case it is slow:
+                # a fast task in front hurts performance less than a slow task in the back
+                return float('inf')


👍

But avg() doesn't need to be a closure within run(), does it?

No, should I move it out to module level? I just wasn't sure about polluting module namespace with a somewhat customized averaging function (after all, it does ignore None and returns infinity when list is empty - which isn't a standard definition of average).

ambv · 2017-03-25T23:39:04Z

mypy/waiter.py

+                        # => order: seq failed, parallel failed, parallel passed, seq passed
+                        # => among failed tasks, sequential should go before parallel
+                        # => among successful tasks, sequential should go after parallel
+                        sequential = -sequential


Editorial nit: what do the => arrows signify? Did you mean an unordered list? Adding punctuation at the end of an element also improves reading flow. Compare:

avoid interrupting parallel tasks with sequential in between

order: seq failed, parallel failed, parallel passed, seq passed;

among failed tasks, sequential should go before parallel; and

among successful tasks, sequential should go after parallel.

I added a few features to runtests.py: * -p argument to pass arguments to pytest * move pytest to the front, run before anything else * combine all pytest tests into one shell command * schedule longer tasks first

JukkaL · 2017-04-03T13:16:05Z

Pushed manually as 5bf9d18. Thanks for the PR! This is going to help us migrate more tests to pytest and generally improves the local testing workflow.

pkch added 4 commits March 16, 2017 19:39

Add passthrough option for pytest

d43ee63

Move flake and pytest to the front of scheduler

3a64f7f

Optimize tests: run pytest under xdist all at once, separately from o…

1396815

…ther tasks; flake8 -j1

Schedule longer tasks first; other minor tweaks

d7d90c0

pkch force-pushed the optimize-tests branch 2 times, most recently from d099749 to b28e8ad Compare March 18, 2017 20:30

Don't overwrite timing file; use lint j0

ed5b3e8

pkch force-pushed the optimize-tests branch from b28e8ad to ed5b3e8 Compare March 18, 2017 20:43

JukkaL reviewed Mar 18, 2017

View reviewed changes

Change Windows / OSX to use total # cores instead of 2

4c61e02

pkch force-pushed the optimize-tests branch from 30ff387 to 4c61e02 Compare March 20, 2017 08:47

Format and rename timing file

72c57d0

pkch force-pushed the optimize-tests branch from c7b32c2 to be24773 Compare March 21, 2017 02:36

pkch force-pushed the optimize-tests branch from be24773 to d8210b9 Compare March 23, 2017 05:03

Update logic; create lru log

5fd33b9

pkch force-pushed the optimize-tests branch from d8210b9 to 5fd33b9 Compare March 23, 2017 05:35

Add --lf, -ff

d7a436f

ambv approved these changes Mar 25, 2017

View reviewed changes

CR fixes

aeda5d9

JukkaL closed this Apr 3, 2017

pkch mentioned this pull request Apr 4, 2017

Intermittent minor glitches with tests on Windows #3128

Closed

gnprice mentioned this pull request May 22, 2017

runtests.py looks like it's hanging at first (though it's not) #3393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize tests (parallel execution) #3019

Optimize tests (parallel execution) #3019

pkch commented Mar 18, 2017 •

edited

Loading

ambv commented Mar 18, 2017

JukkaL left a comment

JukkaL Mar 18, 2017

JukkaL Mar 18, 2017

pkch commented Mar 19, 2017 •

edited

Loading

pkch commented Mar 20, 2017

JelleZijlstra commented Mar 20, 2017

pkch commented Mar 20, 2017

pkch commented Mar 21, 2017

pkch commented Mar 23, 2017 •

edited

Loading

ambv left a comment

ambv Mar 25, 2017

pkch Mar 26, 2017 •

edited

Loading

ambv Mar 25, 2017

ambv Mar 25, 2017

pkch Mar 26, 2017

ambv Mar 25, 2017

JukkaL commented Apr 3, 2017

		@@ -0,0 +1 @@
		{"lint": 9.920947960765536, "run eval-test-A": 7.801836172253644, "run eval-test-B": 4.960789635451974, "run eval-test-C": 15.619698456569175, "run eval-test-D": 4.942676802394395, "run eval-test-E": 19.189689338522875, "run eval-test-F": 6.386842991035238, "run eval-test-G": 2.19908037312878, "run eval-test-H": 1.227226612685314, "run eval-test-I": 19.717739972914515, "run eval-test-L": 5.933065322821506, "run eval-test-M": 9.398179709265477, "run eval-test-N": 9.124606320054497, "run eval-test-O": 2.116921956160695, "run eval-test-P": 1.138871637819193, "run eval-test-R": 8.966922588260957, "run eval-test-S": 11.762846252672603, "run eval-test-T": 6.017162707348632, "run eval-test-U": 1.2191191366774774, "run eval-test-V": 3.01119568085889, "run eval-test-X": 0.22394048143914347, "run eval-test-Y": 0.24050901453989582, "run eval-test-s": 1.2029721354401488, "run cmdline-test": 28.51419420648503, "check file runtests.py": 1.644321132217982, "check legacy entry script": 6.834782028125318, "check legacy myunit script": 1.0219386199038922, "check fast-parse": 0.9087504009885947, "check package mypy": 8.92408001757238, "run unit-test mypy.test.testargs": 0.3363362046487737, "run unit-test mypy.test.testgraph": 0.3252464005840636, "run unit-test mypy.test.testinfer": 0.1808415150811129, "run unit-test mypy.test.testmoduleinfo": 0.08448336075068141, "run unit-test mypy.test.testparse": 0.5110368656676485, "run unit-test mypy.test.testreports": 0.3126996054114102, "run unit-test mypy.test.testsemanal": 5.180875490470299, "run unit-test mypy.test.testsolve": 0.1620294211067801, "run unit-test mypy.test.teststubgen": 0.578677914890875, "run unit-test mypy.test.testsubtypes": 0.18924212758625458, "run unit-test mypy.test.testtransform": 2.3843860963226007, "run unit-test mypy.test.testtypegen": 1.6086580353375552, "run unit-test mypy.test.testtypes": 0.29351970809608474, "run import mypy.api": 0.3575609463173066, "run import mypy.applytype": 0.1455391104136936, "run import mypy.binder": 0.14534078895593439, "run import mypy.build": 0.3166217875479447, "run import mypy.checker": 0.1709609978334754, "run import mypy.checkexpr": 0.17694975276978653, "run import mypy.checkmember": 0.1482002525240489, "run import mypy.checkstrformat": 0.1355957782023438, "run import mypy.constraints": 0.13701247291845675, "run import mypy.defaults": 0.05229128444352682, "run import mypy.erasetype": 0.1371199299632906, "run import mypy.errors": 0.06413960882474612, "run import mypy.expandtype": 0.12776721643587052, "run import mypy.experiments": 0.05509504417588573, "run import mypy.exprtotype": 0.14705812636000815, "run import mypy.fastparse": 0.14595590152512727, "run import mypy.fastparse2": 0.14839067272851025, "run import mypy.fixup": 0.13463893642810376, "run import mypy.git": 0.06504983320454016, "run import mypy.indirection": 0.1365198297754091, "run import mypy.infer": 0.14904134093748667, "run import mypy.join": 0.1473943246877809, "run import mypy.main": 0.33474449717215293, "run import mypy.maptype": 0.13427429358844734, "run import mypy.meet": 0.14524755416704238, "run import mypy.messages": 0.1365549903525789, "run import mypy.moduleinfo": 0.05397069583179359, "run import mypy.nodes": 0.12733896850718907, "run import mypy.options": 0.06343955778265808, "run import mypy.parse": 0.1272931412380558, "run import mypy.report": 0.29526707026269605, "run import mypy.sametypes": 0.13551558048135348, "run import mypy.semanal": 0.19040361182089782, "run import mypy.sharedparse": 0.06393338611371746, "run import mypy.solve": 0.15974161321469182, "run import mypy.stats": 0.13793494424086816, "run import mypy.strconv": 0.13846630352506395, "run import mypy.stubgen": 0.33819299917351486, "run import mypy.stubgenc": 0.05659865267827513, "run import mypy.stubutil": 0.06459867164130628, "run import mypy.subtypes": 0.1582281281456801, "run import mypy.traverser": 0.1380961298081047, "run import mypy.treetransform": 0.13880052653954067, "run import mypy.typeanal": 0.15936630368310034, "run import mypy.typefixture": 0.14723353418318652, "run import mypy.types": 0.13597859392456257, "run import mypy.typevars": 0.13790294416500615, "run import mypy.util": 0.11614249752295791, "run import mypy.version": 0.2009079330163388, "run import mypy.visitor": 0.05279380415322521, "run import mypy.waiter": 0.07502911611840091, "run import mypy": 0.04313728743653655, "run import mypy.myunit": 0.07410032379337395, "run import mypy.test.collect": 0.04308316385146327, "run import mypy.test.config": 0.055242402549879444, "run import mypy.test.data": 0.1498188242618994, "run import mypy.test.helpers": 0.1486087473194857, "run import mypy.test.testargs": 0.3352596588870824, "run import mypy.test.testcheck": 0.3891414656192751, "run import mypy.test.testcmdline": 0.28382882092756745, "run import mypy.test.testextensions": 0.08309313523309925, "run import mypy.test.testgraph": 0.33439249633778445, "run import mypy.test.testinfer": 0.18007864907531257, "run import mypy.test.testmoduleinfo": 0.0765414159994009, "run import mypy.test.testparse": 0.21122617969663793, "run import mypy.test.testpythoneval": 0.19029654983873456, "run import mypy.test.testreports": 0.3122863698639833, "run import mypy.test.testsemanal": 0.39830770956891115, "run import mypy.test.testsolve": 0.15927543927017496, "run import mypy.test.teststubgen": 0.4045279712267984, "run import mypy.test.testsubtypes": 0.14806632628068428, "run import mypy.test.testtransform": 0.39037880188556073, "run import mypy.test.testtypegen": 0.3912929768919753, "run import mypy.test.testtypes": 0.15004045441688163, "run import mypy.test.update": 0.04323526297741864, "run import mypy.test": 0.042489779728867916, "check stubs": 5.7442364307826494, "check stdlibsamples (3.2)": 5.183215841696807, "check file test-data\\samples\\bottles.py": 0.8908374696393935, "check file test-data\\samples\\class.py": 0.9207077626653017, "check file test-data\\samples\\cmdline.py": 0.9078413617968408, "check file test-data\\samples\\crawl.py": 1.6454806411392724, "check file test-data\\samples\\crawl2.py": 1.6100411497271807, "check file test-data\\samples\\dict.py": 0.9536208283358292, "check file test-data\\samples\\fib.py": 0.9493213613543503, "check file test-data\\samples\\files.py": 0.9579119990012828, "check file test-data\\samples\\for.py": 0.916309925080327, "check file test-data\\samples\\generators.py": 0.9264290601774974, "check file test-data\\samples\\greet.py": 0.9175915083650921, "check file test-data\\samples\\guess.py": 0.9301059083991845, "check file test-data\\samples\\hello.py": 0.9109362827131804, "check file test-data\\samples\\input.py": 0.9187209925238449, "check file test-data\\samples\\itertool.py": 0.9060019500292924, "check file test-data\\samples\\regexp.py": 0.9332988048564062}

Optimize tests (parallel execution) #3019

Optimize tests (parallel execution) #3019

Conversation

pkch commented Mar 18, 2017 • edited Loading

ambv commented Mar 18, 2017

JukkaL left a comment

Choose a reason for hiding this comment

JukkaL Mar 18, 2017

Choose a reason for hiding this comment

JukkaL Mar 18, 2017

Choose a reason for hiding this comment

pkch commented Mar 19, 2017 • edited Loading

pkch commented Mar 20, 2017

JelleZijlstra commented Mar 20, 2017

pkch commented Mar 20, 2017

pkch commented Mar 21, 2017

pkch commented Mar 23, 2017 • edited Loading

ambv left a comment

Choose a reason for hiding this comment

ambv Mar 25, 2017

Choose a reason for hiding this comment

pkch Mar 26, 2017 • edited Loading

Choose a reason for hiding this comment

ambv Mar 25, 2017

Choose a reason for hiding this comment

ambv Mar 25, 2017

Choose a reason for hiding this comment

pkch Mar 26, 2017

Choose a reason for hiding this comment

ambv Mar 25, 2017

Choose a reason for hiding this comment

JukkaL commented Apr 3, 2017

pkch commented Mar 18, 2017 •

edited

Loading

pkch commented Mar 19, 2017 •

edited

Loading

pkch commented Mar 23, 2017 •

edited

Loading

pkch Mar 26, 2017 •

edited

Loading