Skip to content

Optimize tests (parallel execution) #3019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

Conversation

pkch
Copy link
Contributor

@pkch pkch commented Mar 18, 2017

I added a few features to runtests.py:

  • -p argument to pass arguments to pytest
  • move pytest to the front, run before anything else
  • combine all pytest tests into one shell command
  • schedule longer tasks first

This PR is not ready for merging, I just wanted someone to start looking at it, and maybe try on their machine to see how fast the tests run.

@ambv
Copy link
Contributor

ambv commented Mar 18, 2017

I will look closer in a while but a first observation: use flake8 -j0 instead of -j1. The former doesn't even start a child worker process which makes it faster than -j1.

@pkch pkch force-pushed the optimize-tests branch 2 times, most recently from d099749 to b28e8ad Compare March 18, 2017 20:30
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few quick notes (didn't do a full review).

mypy/waiter.py Outdated
if not self.found_timing_file:
try:
with open(self.TIMING_FILENAME, 'w') as fp:
json.dump(self.times, fp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change to generate a pretty-printed, multi-line json file instead? It would be easier to read. For example, you could use sort_keys=True, indent=4.

@@ -0,0 +1 @@
{"lint": 9.920947960765536, "run eval-test-A": 7.801836172253644, "run eval-test-B": 4.960789635451974, "run eval-test-C": 15.619698456569175, "run eval-test-D": 4.942676802394395, "run eval-test-E": 19.189689338522875, "run eval-test-F": 6.386842991035238, "run eval-test-G": 2.19908037312878, "run eval-test-H": 1.227226612685314, "run eval-test-I": 19.717739972914515, "run eval-test-L": 5.933065322821506, "run eval-test-M": 9.398179709265477, "run eval-test-N": 9.124606320054497, "run eval-test-O": 2.116921956160695, "run eval-test-P": 1.138871637819193, "run eval-test-R": 8.966922588260957, "run eval-test-S": 11.762846252672603, "run eval-test-T": 6.017162707348632, "run eval-test-U": 1.2191191366774774, "run eval-test-V": 3.01119568085889, "run eval-test-X": 0.22394048143914347, "run eval-test-Y": 0.24050901453989582, "run eval-test-s": 1.2029721354401488, "run cmdline-test": 28.51419420648503, "check file runtests.py": 1.644321132217982, "check legacy entry script": 6.834782028125318, "check legacy myunit script": 1.0219386199038922, "check fast-parse": 0.9087504009885947, "check package mypy": 8.92408001757238, "run unit-test mypy.test.testargs": 0.3363362046487737, "run unit-test mypy.test.testgraph": 0.3252464005840636, "run unit-test mypy.test.testinfer": 0.1808415150811129, "run unit-test mypy.test.testmoduleinfo": 0.08448336075068141, "run unit-test mypy.test.testparse": 0.5110368656676485, "run unit-test mypy.test.testreports": 0.3126996054114102, "run unit-test mypy.test.testsemanal": 5.180875490470299, "run unit-test mypy.test.testsolve": 0.1620294211067801, "run unit-test mypy.test.teststubgen": 0.578677914890875, "run unit-test mypy.test.testsubtypes": 0.18924212758625458, "run unit-test mypy.test.testtransform": 2.3843860963226007, "run unit-test mypy.test.testtypegen": 1.6086580353375552, "run unit-test mypy.test.testtypes": 0.29351970809608474, "run import mypy.api": 0.3575609463173066, "run import mypy.applytype": 0.1455391104136936, "run import mypy.binder": 0.14534078895593439, "run import mypy.build": 0.3166217875479447, "run import mypy.checker": 0.1709609978334754, "run import mypy.checkexpr": 0.17694975276978653, "run import mypy.checkmember": 0.1482002525240489, "run import mypy.checkstrformat": 0.1355957782023438, "run import mypy.constraints": 0.13701247291845675, "run import mypy.defaults": 0.05229128444352682, "run import mypy.erasetype": 0.1371199299632906, "run import mypy.errors": 0.06413960882474612, "run import mypy.expandtype": 0.12776721643587052, "run import mypy.experiments": 0.05509504417588573, "run import mypy.exprtotype": 0.14705812636000815, "run import mypy.fastparse": 0.14595590152512727, "run import mypy.fastparse2": 0.14839067272851025, "run import mypy.fixup": 0.13463893642810376, "run import mypy.git": 0.06504983320454016, "run import mypy.indirection": 0.1365198297754091, "run import mypy.infer": 0.14904134093748667, "run import mypy.join": 0.1473943246877809, "run import mypy.main": 0.33474449717215293, "run import mypy.maptype": 0.13427429358844734, "run import mypy.meet": 0.14524755416704238, "run import mypy.messages": 0.1365549903525789, "run import mypy.moduleinfo": 0.05397069583179359, "run import mypy.nodes": 0.12733896850718907, "run import mypy.options": 0.06343955778265808, "run import mypy.parse": 0.1272931412380558, "run import mypy.report": 0.29526707026269605, "run import mypy.sametypes": 0.13551558048135348, "run import mypy.semanal": 0.19040361182089782, "run import mypy.sharedparse": 0.06393338611371746, "run import mypy.solve": 0.15974161321469182, "run import mypy.stats": 0.13793494424086816, "run import mypy.strconv": 0.13846630352506395, "run import mypy.stubgen": 0.33819299917351486, "run import mypy.stubgenc": 0.05659865267827513, "run import mypy.stubutil": 0.06459867164130628, "run import mypy.subtypes": 0.1582281281456801, "run import mypy.traverser": 0.1380961298081047, "run import mypy.treetransform": 0.13880052653954067, "run import mypy.typeanal": 0.15936630368310034, "run import mypy.typefixture": 0.14723353418318652, "run import mypy.types": 0.13597859392456257, "run import mypy.typevars": 0.13790294416500615, "run import mypy.util": 0.11614249752295791, "run import mypy.version": 0.2009079330163388, "run import mypy.visitor": 0.05279380415322521, "run import mypy.waiter": 0.07502911611840091, "run import mypy": 0.04313728743653655, "run import mypy.myunit": 0.07410032379337395, "run import mypy.test.collect": 0.04308316385146327, "run import mypy.test.config": 0.055242402549879444, "run import mypy.test.data": 0.1498188242618994, "run import mypy.test.helpers": 0.1486087473194857, "run import mypy.test.testargs": 0.3352596588870824, "run import mypy.test.testcheck": 0.3891414656192751, "run import mypy.test.testcmdline": 0.28382882092756745, "run import mypy.test.testextensions": 0.08309313523309925, "run import mypy.test.testgraph": 0.33439249633778445, "run import mypy.test.testinfer": 0.18007864907531257, "run import mypy.test.testmoduleinfo": 0.0765414159994009, "run import mypy.test.testparse": 0.21122617969663793, "run import mypy.test.testpythoneval": 0.19029654983873456, "run import mypy.test.testreports": 0.3122863698639833, "run import mypy.test.testsemanal": 0.39830770956891115, "run import mypy.test.testsolve": 0.15927543927017496, "run import mypy.test.teststubgen": 0.4045279712267984, "run import mypy.test.testsubtypes": 0.14806632628068428, "run import mypy.test.testtransform": 0.39037880188556073, "run import mypy.test.testtypegen": 0.3912929768919753, "run import mypy.test.testtypes": 0.15004045441688163, "run import mypy.test.update": 0.04323526297741864, "run import mypy.test": 0.042489779728867916, "check stubs": 5.7442364307826494, "check stdlibsamples (3.2)": 5.183215841696807, "check file test-data\\samples\\bottles.py": 0.8908374696393935, "check file test-data\\samples\\class.py": 0.9207077626653017, "check file test-data\\samples\\cmdline.py": 0.9078413617968408, "check file test-data\\samples\\crawl.py": 1.6454806411392724, "check file test-data\\samples\\crawl2.py": 1.6100411497271807, "check file test-data\\samples\\dict.py": 0.9536208283358292, "check file test-data\\samples\\fib.py": 0.9493213613543503, "check file test-data\\samples\\files.py": 0.9579119990012828, "check file test-data\\samples\\for.py": 0.916309925080327, "check file test-data\\samples\\generators.py": 0.9264290601774974, "check file test-data\\samples\\greet.py": 0.9175915083650921, "check file test-data\\samples\\guess.py": 0.9301059083991845, "check file test-data\\samples\\hello.py": 0.9109362827131804, "check file test-data\\samples\\input.py": 0.9187209925238449, "check file test-data\\samples\\itertool.py": 0.9060019500292924, "check file test-data\\samples\\regexp.py": 0.9332988048564062}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a file name starting with . for this, since it usually won't need to be manipulated manually? Also can you regenerate to use pretty-printed json (as discussed above).

@pkch
Copy link
Contributor Author

pkch commented Mar 19, 2017

Also, Travis CI is now ~20% slower than before; do we know the platform it's running on?

Edit: never mind... it's just fluctuating up and down randomly.

@pkch
Copy link
Contributor Author

pkch commented Mar 20, 2017

Currently, the timing file is only generated if it's not already present. So we need to delete it once in a while, as new tests (especially long ones) are added to the suite. Also, it's best to run -j1 when recreating the file, since the wall time then better corresponds to actual CPU time.

@JelleZijlstra
Copy link
Member

It might be overkill, but we could store a list of the last N timings instead of just a single timing, then use an average of those timings to decide on ordering. That way random fluctuations don't influence ordering as much.

Another idea is to record which tests have failed recently and run them first. That way, you're more likely to know quickly whether you fixed a test that was previously broken.

@pkch
Copy link
Contributor Author

pkch commented Mar 20, 2017

@JelleZijlstra I actually measured the times using a -j1 run because with parallel execution wall times are way too variable (primarily because of the interference of other processes on the user machine). I'd say with -j1, the variability is too small to worry about measuring average of the last several timings, but I can do that.

I do like the idea of running last-failed tests first. pytest can already do that with -ff option (it uses its own cache). I could add this functionality to runtests.py and combine it with pytest in a reasonable way. But I think I should make a separate PR, since its only tangentially related to this one, and I want to either approve or reject this PR first to avoid merge hell.

@pkch
Copy link
Contributor Author

pkch commented Mar 21, 2017

I changed the logic a bit. Now the only way to write (or overwrite) the timing file is to execute python runtests.py -r; a warning to do so is issued whenever the file doesn't exist, or whenever there are long tasks that aren't included in the existing file.

@pkch
Copy link
Contributor Author

pkch commented Mar 23, 2017

I don't like the workflow with -r any more. It requires someone to think about this, and maybe even read the docs (which I'll then have to write). I want to go back to doing everything implicitly. And I've incorporated @JelleZijlstra suggestion to use several last test times, and to add an option to run last failed tests first or only failed tests (using the same options as in pytest: --lf and --ff).

I've just pushed a new commit for that (d7a436f).

Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -161,12 +189,14 @@ def _record_time(self, name: str, elapsed_time: float) -> None:

def _poll_current(self) -> Tuple[int, int]:
while True:
time.sleep(.05)
time.sleep(.01)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance improvements by trimming the slowdown loops!

Copy link
Contributor Author

@pkch pkch Mar 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambv Actually, I'm only hoping so. On the positive side, with ~140 tasks, we're cutting down on the average delay of 0.05 per task, or ~7 sec for the current test suite (well, for slower tasks, it's actually half of that, since on average they finish in the middle of the wait interval). Spread across (say) 4 workers, it reduces total wall time by a quarter of that, or ~2 sec.

But OTOH, reducing the wait time also increases the number of context switches to the main process and the CPU load of the main process (i.e., the process that is waiting for the next task to finish); both would cause slowdown in the workers. I hope it's not enough to become noticeable at 10 ms wait time, given that a context switch is just a few microseconds, and the runtime of the loop seems to be pretty short too. But I'm too lazy to test it.

mypy/waiter.py Outdated
except FileNotFoundError:
test_log = []
except json.JSONDecodeError:
print('corrupt test log file {}'.format(self.FULL_LOG_FILENAME))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: file=sys.stderr

# we don't know how long a new task takes
# better err by putting it in front in case it is slow:
# a fast task in front hurts performance less than a slow task in the back
return float('inf')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

But avg() doesn't need to be a closure within run(), does it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, should I move it out to module level? I just wasn't sure about polluting module namespace with a somewhat customized averaging function (after all, it does ignore None and returns infinity when list is empty - which isn't a standard definition of average).

mypy/waiter.py Outdated
# => order: seq failed, parallel failed, parallel passed, seq passed
# => among failed tasks, sequential should go before parallel
# => among successful tasks, sequential should go after parallel
sequential = -sequential
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial nit: what do the => arrows signify? Did you mean an unordered list? Adding punctuation at the end of an element also improves reading flow. Compare:

avoid interrupting parallel tasks with sequential in between

  • order: seq failed, parallel failed, parallel passed, seq passed;
  • among failed tasks, sequential should go before parallel; and
  • among successful tasks, sequential should go after parallel.

JukkaL pushed a commit that referenced this pull request Apr 3, 2017
I added a few features to runtests.py:

* -p argument to pass arguments to pytest
* move pytest to the front, run before anything else
* combine all pytest tests into one shell command
* schedule longer tasks first
@JukkaL
Copy link
Collaborator

JukkaL commented Apr 3, 2017

Pushed manually as 5bf9d18. Thanks for the PR! This is going to help us migrate more tests to pytest and generally improves the local testing workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants