Make tests faster #778

mdboom · 2012-04-17T18:14:34Z

This is just a placeholder for some work that would be nice to get done.

The baseline images are converted from PDF and SVG to PNG every run. These could be cached between runs to save a good portion of runtime
Look into running tests in parallel. Nose supports this, but it wasn't working well last time I tried.
Maybe we could use lower resolution PNGs and convert the PDFs and SVGs at lower resolution to make things faster. This would also hopefully gloss over some of the text-related false positives we see on a regular basis.

Any other ideas? Let's brainstorm!

ivanov · 2012-03-20T19:12:08Z

this sounds great - a year ago we made the tests run faster, but since then, I think the fine grained nature of the text / mathtext tests made them pretty slow again (most of the time is spent in test_mathtext on my machine.)

there's also the issue (related to 1.) of calling inkscape multiple times - I think we've tried looking into this before about leaving a process running and batching those calls together.

it would also be nice to limit the testing to one of PDF, SVG, or PNG - I do this by hand to speed things up.

astrofrog · 2012-03-22T22:10:19Z

If (and this is a big if) we can make freetype behave predictably (as we've started to in #779, though more work needs to be done), then would one not expect the SVG files to be identical, in which case we could just compare the contents of the file and not convert them to PNG? (same for PDF)

mdboom · 2012-03-22T23:09:51Z

The freetype issue is totally orthogonal to SVG and PDF, since the freetype renderer is not used by vector backends. The SVG backend only uses the freetype TrueType file parser to get the curves, which come directly from the file and shouldn't be impacted by the version of freetype. The PDF backend doesn't even use that -- it uses a library called ttconv.

The conversion of SVG and PDF to PNG will be necessary for the foreseeable future.

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs. The backends are constantly tweaked to fix bugs and improve efficiencies etc. which can change the structure of the file in radical ways while (if all goes well) not impacting the final result. The only way to look at the final result in an objective way is to render it to a bitmap and compare it in that space.

mdboom · 2012-04-17T18:22:34Z

Here's the first chunk of changes to make the tests faster. As these things tend to be finicky across systems, I'd appreciate as much testing as possible.

The conversions from PDF and SVG to PNG are cached. By default, this is to ~/.matplotlib/test-cache, but the location can be configured (as usual) with the MPLCONFIGDIR environment variable. The files are named based on the MD5 cache of the source file, so caching should work and have benefit even when flipping between different git branches. For this reason, only the baseline images are cached, since they have a fixed MD5 hash. Test result images can not be cached because the ordering of the elements in them may change between identical runs at the whim of dictionary ordering. This would result in the size of the cache exploding. Still, I get about a 33% speedup overall with this technique.
For a smaller gain, the histogram of the images that is used for comparison is replaced by bincount if Numpy >= 1.6 (the first version to have a minlength argument to bincount). This was the largest time sink after ghostscript and inkscape when running tests.
If available, it will use the python bindings to the ghostscript API (python-ghostscript). This prevents firing off separate processes and reinitializing ghostscript many times. The speedup is pretty modest, unfortunately, but it still may be worth leaving in. pip install ghostscript to get the bindings.
Lastly, unrelated to speed, this changes the file-naming scheme so rather than expected-foo.png, we have foo-expected.png. When displaying the files in alphabetical order in a thumbnail viewer (such as gthumb) the relevant files are grouped together. This helps to more efficiently scan for problems.

jkseppan · 2012-05-05T09:49:12Z

It might be useful to leave ghostscript and inkscape running as separate processes. For ghostscript, you'd start it with a command line like

gs -sDEVICE=png16m -dNOPAUSE -sOutputFile=page%d.png

and send input lines like

(foo/bar/baz.pdf) run
(another/path/to/file.pdf) run

Trying this from the command line, it seems to work, but I don't know if there are subtle problems where e.g. one pdf file causes a font to be loaded that affects the rendering of the next file. Error recovery is another big question. I think you'd want to use something like pexpect to wait for gs prompts to know that the latest file has been completely written.

Inkscape has a --shell option that allows you to send it input lines like "input.svg --export-png=output.png".

jkseppan · 2012-05-05T09:57:16Z

Our tests are mainly end-to-end regression tests: you run a script and check that the output stays the same as it was before. That's useful for noticing regressions, but tests of a smaller scope would be faster to run. We could for example set up a test suite that includes predefined sequences of backend function calls and checks that the output for each sequence stays the same. That would allow skipping the frontend code when you are working on a specific backend. Likewise, we could have a stub backend that only records the calls made by the frontend code, and compares that sequence to a predefined "correct" sequence for a given test case.

A related thought: if the freetype version differences are problematic, we could make most test cases not include any text output (disable tick labels, etc), and have a smaller number of cases that specifically test text rendering, and include a larger set of correct answers for just those tests.

jkseppan · 2012-05-05T12:13:36Z

Hey, it shouldn't be too hard to make a checksummer that's aware of pdf syntax and tolerates trivial changes like reordering of dictionary elements and ignores things like the CreationDate timestamp. That might speed the usual case up quite a bit.

mdboom · 2012-05-07T15:04:34Z

@jkseppan: These are some good ideas.

I had implemented using Inkscape around a year ago, but ultimately reverted it. Inkscape leaked memory with each render, so it needed to be restarted periodically, and I wasn't terribly comfortable with coming up with a heuristic for how often to do that. See 9086ef1 for the original kick at that can. It may be that current versions of Inkscape are less leaky, and it's also possible that my implementation could be improved. I can't find my old numbers, but I don't recall that it made a huge improvement in runtime, but that may have been that the memory leaks were causing problems on a memory-constrained machine.

I haven't attempted a similar approach with ghostscript, but it's nice to know it can be done.

I think removing text except where it is necessary is a good idea. There are cases where the ticking itself is being tested and we want to make sure the numbers and layout of the ticks are correct, but for most tests the ticking is superfluous.

mdboom · 2012-05-07T18:45:13Z

I have an initial commit that coverts all of the PDF's to PNG's using the same gs process. The speedup isn't terribly dramatic: It went from 4:30 to 4:15 wall clock time on my machine. It would be great if this could be tested -- I have no idea how robust this is to gs crashing etc.

mdboom · 2012-05-08T14:11:57Z

Rebased on current master

jkseppan · 2012-05-13T06:38:43Z

Tests pass on a Macbook Pro running OS X 10.6.8, Python 2.7.2, GPL Ghostscript 8.71:

Ran 1090 tests in 615.499s

OK (KNOWNFAIL=2)

jkseppan · 2012-05-13T07:17:54Z

I tried this out on Amazon EC2. I compared the runtime of python -c 'import matplotlib; matplotlib.test(1)' of this commit ("after") and this commit with e062691 reverted ("before"). I used an Amazon Linux AMI that has Python 2.6, installed various prerequisites and compiled matplotlib from git. I couldn't find a version of Inkscape for that distribution, so the numbers don't include svg conversion.

On a small instance (single core, pretty slow), "before" took 702 seconds and "after" took 706 seconds, so it was slightly slower but probably within measurement noise.

On a high-cpu medium instance (dual core, each supposedly 2.5 times faster than the small instance), "before" took 452 seconds and "after" took 357 seconds. Just to be sure, I ran "before" again, and it took 446 seconds, so I don't think it was a fluke.

Based on this, it looks like this makes sense on dual-core but doesn't help on single-core. Perhaps the overhead from starting a new process is similar to that from context switches on single-core.

pelson · 2012-05-23T17:17:04Z

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs.

Depending on how much of an overhead the conversion/storage of pdf&svg -> png is, we could always use an OrderedDict for python >= 2.7 (there are loads of OrderedDics recipes pre that version) to give the svg/pdf output a consistent ordering.

…ff" toward the end of the filename so files are more appropriately grouped when viewed in file managers and thumbnail tools (eg. gthumb)

….6. Continue to use histogram for earlier Numpy versions.

…ache the baseline images, since the test results are never the same twice given different dictionary orderings in the file.

… removes titles and ticks from a figure before comparison. Use this throughout to make more of the tests less dependent on text differences.

…oesn't seem to offer any speed advantage.

pelson · 2012-07-05T21:25:19Z

@mdboom: What kind of state is this in? I'm prepared to go through these changes if its going to save us all time in the long run.

mdboom · 2012-08-03T18:02:00Z

@jkseppan: The real advantage of this PR is on subsequent runs, since it caches the results of conversions. For a real speed comparison, you should run each test run twice (cleaning ~/.matplotlib before each pair of runs).

@pelson: I think this is good to merge. It would be nice to be able to get some predictable ordering in SVG and PDF such that the PNG conversions of the test results (rather than just the baseline images) could also be done. But there's no reason that can't be done as a follow-up in a separate PR.

pelson · 2012-08-03T22:56:22Z

lib/matplotlib/testing/compare.py

@@ -127,7 +172,7 @@ def comparable_formats():
   on this system.'''
   return ['png'] + converter.keys()

-def convert(filename):
+def convert(filename, cache):


Some high level comments about cache wouldn't go amiss here.

pelson · 2012-08-03T23:10:02Z

On the whole I agree with removing the texts on some of the test results. However I wonder if the text in the symlog2 should remain. (there may be more as I go through them)

pelson · 2012-08-03T23:15:15Z

I've got to the stage of reviewing the actual test_*.py files, which I will try to do tomorrow, but I have no probs with the rest of the changes.

pelson · 2012-08-05T15:11:38Z

@mdboom: I'm going to merge this while it is still fresh (and mergable). For some reason github isn't showing all the diffs (the test files for instance) so I had to read the diffs on my clone. There are no show stoppers, but I do think it is worth addressing some of my comments either on this PR, or in a seperate one (if they need code).

Good stuff.

Made tests faster

mdboom mentioned this pull request Mar 22, 2012

Fix failing tests on maintenance branch #779

Merged

mdboom added 8 commits June 13, 2012 08:43

Change generated filenames. Put the phrases "expected" and "failed-di…

b58aa94

…ff" toward the end of the filename so files are more appropriately grouped when viewed in file managers and thumbnail tools (eg. gthumb)

Use bincount rather than histogram for comparing images on Numpy >= 1…

2ad49ae

….6. Continue to use histogram for earlier Numpy versions.

Cache results of previous {PDF|SVG} -> PNG conversions. We can only c…

b3c7bcd

…ache the baseline images, since the test results are never the same twice given different dictionary orderings in the file.

Use python-ghostscript to convert PDF and EPS to PNG (if available).

d638c9b

Use the same gs process to convert all of the images from PDF to PS

bf23f41

Add a new kwarg to the image_comparison decorator, remove_text, which…

342de60

… removes titles and ticks from a figure before comparison. Use this throughout to make more of the tests less dependent on text differences.

Remove obsolete docs about using ghostscript library since using it d…

bd8eaeb

…oesn't seem to offer any speed advantage.

Fix tests after rebase

16f450f

pelson reviewed Aug 3, 2012
View reviewed changes

pelson added a commit that referenced this pull request Aug 5, 2012

Merge pull request #778 from mdboom/tests-faster

1456f05

Made tests faster

pelson merged commit 1456f05 into matplotlib:master Aug 5, 2012

mdboom mentioned this pull request Aug 6, 2012

Test framework cleanups #1056

Merged

mdboom deleted the tests-faster branch March 3, 2015 18:44

Uh oh!

Make tests faster #778

Make tests faster #778

Uh oh!

Conversation

mdboom commented Apr 17, 2012

Uh oh!

ivanov commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

mdboom commented Mar 22, 2012

Uh oh!

mdboom commented Apr 17, 2012

Uh oh!

jkseppan commented May 5, 2012

Uh oh!

jkseppan commented May 5, 2012

Uh oh!

jkseppan commented May 5, 2012

Uh oh!

mdboom commented May 7, 2012

Uh oh!

mdboom commented May 7, 2012

Uh oh!

mdboom commented May 8, 2012

Uh oh!

jkseppan commented May 13, 2012

Uh oh!

jkseppan commented May 13, 2012

Uh oh!

pelson commented May 23, 2012

Uh oh!

pelson commented Jul 5, 2012

Uh oh!

mdboom commented Aug 3, 2012

Uh oh!

pelson Aug 3, 2012

Choose a reason for hiding this comment

Uh oh!

pelson commented Aug 3, 2012

Uh oh!

pelson commented Aug 3, 2012

Uh oh!

pelson commented Aug 5, 2012

Uh oh!

Uh oh!