Fix failing tests on maintenance branch #779

mdboom · 2012-03-20T15:12:21Z

Fix failing tests -- many of them were broken due to changes in the snapping algorithm that made the axes off by one pixel. Others were broken due to changes in text alignment. Also fixes a bug in the SVG backend's Gouraud shading.

jdh2358 · 2012-03-20T15:38:49Z

I just accidentally ran the tests on a remote box w/ no X11 connection and got a ton of failures because the test import triggered my backend import (which was GTKAgg). The traceback is below. The question is: would it do any harm to call:

use('agg')

as the first line of matplotlib.test, eg before importing nose here: https://github.com/matplotlib/matplotlib/blob/v1.1.x/lib/matplotlib/__init__.py#L995

Michael, if you think this is a good idea maybe we should just tack it on to this PR

here is the traceback:
johnh@lettuce:mpl_test> python -c 'import matplotlib as m; m.test()'
/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/init.py:57: GtkWarning: could not open display
warnings.warn(str(e), _gtk.Warning)
/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk.py:49: GtkWarning: IA__gdk_cursor_new_for_display: assertion `GDK_IS_DISPLAY (display)' failed

ERROR: Failure: RuntimeError (could not create GdkCursor object)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose/loader.py", line 379, in loadTestsFromName
module = resolve_name(addr.module)
File "/usr/lib/python2.7/site-packages/nose/util.py", line 321, in resolve_name
module = __import__('.'.join(parts_copy))
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/tests/test_dates.py", line 3, in <module>
from matplotlib.testing.decorators import image_comparison, knownfailureif, cleanup
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/testing/decorators.py", line 8, in <module>
from matplotlib import pyplot as plt
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 95, in <module>
new_figure_manager, draw_if_interactive, _show = pylab_setup()
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/__init__.py", line 25, in pylab_setup
globals(),locals(),[backend_name])
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtkagg.py", line 10, in <module>
from matplotlib.backends.backend_gtk import gtk, FigureManagerGTK, FigureCanvasGTK,\
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk.py", line 49, in <module>
cursors.MOVE          : gdk.Cursor(gdk.FLEUR),
RuntimeError: could not create GdkCursor object

mdboom · 2012-03-20T15:43:25Z

@jdh2358: Sure -- that works for me. Added to this PR.

astrofrog · 2012-03-20T15:43:35Z

I also had issues where the tests would try and open many interactive windows, so I'd be in favor of explicitly setting the backend for the tests to 'Agg'.

jdh2358 · 2012-03-20T15:44:32Z

I'm still getting 50 test failures and I haven't checked all of them but the ones I've spot checked appear to be the dreaded font differences. This can be ignored for now, but we need to put our heads together and come up with a system where we are not overriding each other's baseline images because of local configuration differences. One approach would be to get some racked hosting and give all developers access to a common build environment for testing. Another would be to standardize on a freetype version and config.

mdboom · 2012-03-20T15:50:01Z

@jdh2358: I'd like to experiment with this font issue a little bit. Can you send me one image with lots of mismatched text? I'm going to see if I can somehow reproduce your output by playing with Freetype's various flags.

astrofrog · 2012-03-20T15:57:08Z

@mdboom - I'm getting lots of test failures because PIL isn't installed, but I wasn't getting this with the master branch, because I think the PIL dependence was removed in the mean time. I'll create a new environment with PIL to test this pull request.

astrofrog · 2012-03-20T16:00:49Z

@jdh2358 - regarding font differences, I like the option to explicitly state that the tests should pass with a specific version of freetype. If you know which versions of freetype cause issues, one could even mark the relevant tests as known fails (for those 'bad' freetype versions). This requires knowing the freetype version in the tests though.

mdboom · 2012-03-20T16:12:20Z

@astrofrog: Yes, PIL is required for 1.1.x, but not for master.

The problem with relying on a particular version of freetype is that it affects virtually every test -- we'll have a basically useless
test suite if that's the case. And I don't want to force all developers to build their own freetype if I can avoid it. I'm still looking into ways that the freetype renderer may be forced to behave a certain way.

astrofrog · 2012-03-20T17:05:10Z

I've tested the branch in this PR with various Python and Numpy versions, and it does fix some of the test_axes tests:

matplotlib.tests.test_axes.test_marker_edges.test
matplotlib.tests.test_axes.test_markevery.test
matplotlib.tests.test_axes.test_polar_annotations.test
matplotlib.tests.test_axes.test_polar_rmin.test
matplotlib.tests.test_axes.test_polar_theta_position.test
matplotlib.tests.test_axes.test_polar_wrap.test
matplotlib.tests.test_axes.test_polar_wrap.test
matplotlib.tests.test_axes.test_symlog.test

There are still some tests failing, but looking at the diff images, this seems to be due to fonts (the residuals for the other test_axes tests now don't have a frame).

The log for the build/install/tests is here: http://db.tt/iNBIScaA

The new test images including residuals is here: http://db.tt/tvbk02A0

So the bottom line is that this is an improvement (67 instead of 77 failures), and I think most of the remaining issues are font-based (but if you look at the residuals, you'll see some exceptions).

I agree that it would be better to figure out how to force freetype to render a certain way, rather than requiring a specific version.

mdboom · 2012-03-20T17:16:08Z

I'm not finding anything useful to fix the freetype version difference problem. There are some other more blunt options, like converting text to rectangles when in test mode or something, but that brings its own set of problems.

I'd still lean toward merging this before the release, despite its problems, with a note that the baseline images were generated with freetype 13.1.7 and any other versions may cause problems with the tests.

jdh2358 · 2012-03-20T17:23:55Z

Yes, I'm in favor of merging too rather than worrying about this long-standing issue right now. This is something we can work on ahead of the next release. @astrofrog, could you tell us which test failures you are seeing that don't appear font related?

astrofrog · 2012-03-20T18:59:21Z

I agree that this should be merged without waiting for a font-related fix. If we know which freetype version should work, we can at least set up the continuous integration with that version. I'll try and do that on my side.

Tests that look like they are not just font related:

matplotlib.tests.test_axes.test_canonical.test (frame issue)
matplotlib.tests.test_text.test_font_styles.test (frame issue)
matplotlib.tests.test_axes.test_basic_annotate.test (frame issue)
matplotlib.tests.test_axes.test_shaped_data.test (frame and markers issues)
matplotlib.tests.test_axes.test_single_point.test (frame and markers issues)

The images are all in the tar file I linked to previously.

jdh2358 · 2012-03-20T19:10:42Z

I can confirm that canonical, shaped_data, font_styles and single_point have changed (I am not seeing any images for test_basic_annotate at all). They did not trigger an error on my system because the diff was too small. They look like the difference is from #695 because there is a 1 pixel shift in the entire axes bounding box and tick markers. My suggestion is to add the new baseline image for these tests after you have had a chance to confirm that the new look correct to you Michael.

mdboom · 2012-03-20T19:13:20Z

The version of freetype I'm using to generate the images is 13.1.7. The version John is using (which has different results) is 12.2.6. Perhaps anything in the 13 series is close enough... I'm not sure how to determine except by experimentation.

This is messy. In this pull request, I only updated the images that were failing for me -- which were primarily related to the frame snapping offset differences. There are some bonafide frame snapping offset differences still in there, but they don't cause the threshold to be tripped unless, as on @astrofrog's machine, a different version of freetype is being used. I'm thinking the best thing to do may be to update all the pngs to my output set (I've already manually verified that everything is close enough to correct). It would then be nice to lower the threshold to be more sensitive to changes once we can assume that freetype differences don't need to be accounted for. Better yet, to not have a threshold but to expect a perfect match would be even better...

jdh2358 · 2012-03-20T19:16:19Z

Agree on pushing all of your result images as the new baseline since you have manually verified them and the diffs we are seeing are wither font related or pixel shifting changes where the new result images are the correct ones.

astrofrog · 2012-03-20T19:33:22Z

I had freetype 14.1.8. I'm just setting up my Jenkins environment with 13.1.7 so I can get rid of the font errors and hopefully be able to focus on the real failures. I should be able to report back in a couple of hours.

By the way, out of interest, what are all the known failures? Are they all font-related?

astrofrog · 2012-03-20T19:53:15Z

Side note: did you see that Github allows you to show the actual difference image between the old and new images in your commit? (default is just side by side, but difference is an option) Very cool!

WeatherGod · 2012-03-20T20:05:23Z

I performed the tests on Linux with freetype 13.1.7 (before commit 152fb09), and I had two Known fails and 1 other failure. The failure was with test_axes.test_pcolormesh with the svg output.

astrofrog · 2012-03-20T20:08:28Z

Is there any way that the libpng version could affect things? Which version of libpng are you using? I tried running the tests building Matplotlib with freetype 13.1.7 (instead of 14.1.8), and I'm still getting the same label-related failures. It almost looks more like an anti-aliasing difference than differences in actual font/characters. In most cases it seems the characters are the same, but the differences are in the faint anti-aliased part of the font, so the difference images show a 'fuzz' around each character.

WeatherGod · 2012-03-20T20:09:08Z

Looks like the baseline image for pcolormesh_svg.png is missing the gourond shading example (it is blank) meanwhile, the generated gourond shading for svg does not look as smooth as it is for the png output

WeatherGod · 2012-03-20T20:10:55Z

obviously, I can't spell.... gouraud shading

astrofrog · 2012-03-20T20:17:03Z

By the way, is it normal there are so many known failures on MacOS X?

FAILED (KNOWNFAIL=526, failures=67)

(this is with freetype 13.1.7)

jdh2358 · 2012-03-20T20:20:15Z

@astrofrog , you have so many known fails because you don't have the requirements to test one or more of the PDF, PS or SVG backends. the test infrastructure will check for a dependency for a given image type, and if it is not there, register it as a knownfail.

http://matplotlib.sourceforge.net/devel/coding_guide.html#running-the-tests

astrofrog · 2012-03-20T20:27:40Z

@jdh2358 - ah, that makes sense, thanks! I'm running these in a clean environment, so I'll try and add the requirements for the other tests.

Regarding the font issue, I'm fairly convinced it's an anti-aliasing issue (with the fonts) - is there any way that freetype on Mac might include a different anti-aliasing algorithm than on Linux?

mdboom · 2012-03-20T20:44:35Z

@WeatherGod: I neglected to include the fixed pcolormesh.svg baseline image. It's normal that the gouraud shading in SVG is not as smooth as the other backends -- it has to be "faked" since it isn't native to the format.

As for differences in freetype, it may be that it is configured differently. This is the ftconfig.h for my machine (it is the prebuilt package for Fedora 16):

http://db.tt/I5hfjXke

Seeing how that compares with your ftconfig.h may offer some clues.

See also this:

http://www.freetype.org/freetype2/docs/ft2faq.html#builds-differences

astrofrog · 2012-03-20T21:08:18Z

My ftconfig.h file is pretty long... http://db.tt/9OPuGJeH

Though according to the second link you sent - maybe we should compare ftoption.h? Mine is: http://db.tt/6Jo7a8wH

mdboom · 2012-03-20T21:30:44Z

Ah, indeed. The correct file to compare is ftoption.h. Mine is here:

http://db.tt/pZgy6q52

mdboom · 2012-03-20T21:37:09Z

Hmmm... and they're identical. Perhaps there is some sort of gamma correction or colorspace conversion going on the PNG reading/writing. (It should otherwise be lossless). As an experiment -- can you run the simple_plot.py example and save it in rgba format (this is just a raw dump of the image buffer). I've put mine up here.

http://db.tt/t8ma7qVt

If there are no differences here, but there are when saved as png we can rule out freetype and point the finger at png (though it seems like a longshot).

astrofrog · 2012-03-22T19:59:03Z

Ok, so I did some further tests with freetype_test, and it seems some other characters do have changes in the monochromatic version at other versions of freetype. For example, character 'y' (which I think is causing some of the above failures) changes at 2.3.0, 2.3.9, 2.4.4, and 2.4.5. This is why we are still not getting consistent results across all versions.

astrofrog · 2012-03-22T20:02:33Z

Anyway, it seems like the current pull request already provides significant improvement compared to before, so maybe we should just merge this (after marking the above tests as known fails) for this candidate release, and we can do a more exhaustive test of freetype varying the characters to see where all the transitions are, and whether we can fix any of them?

jdh2358 · 2012-03-22T20:36:32Z

This is the last issue to close before I burn the 1.1.1 release candidate and put a tarball up for building and testing, so Michael, fire at will when you are ready to merge this sucker. If you want to wait another day and try stamping out the remaining brush fires, I have no problem delaying.

mdboom · 2012-03-22T21:23:41Z

I can hopefully polish this a little more tonight... save the bigger ideas for another time.

astrofrog · 2012-03-22T21:56:41Z

@mdboom - should we start a new issue to keep track of more findings? I've done some more research into which freetype versions are causing issues, but I don't want to continue flooding this PR with information at this stage.

astrofrog · 2012-03-22T22:07:27Z

I won't go into much detail here, but basically 2.4.5 is the freetype version where the most characters changed. Other versions where things change are 2.3.0, 2.3.10, 2.4.0, 2.4.4, and 2.4.5. But the bottom line of my testing is that things are stable from 2.4.5 onwards (for monochromatic font rendering), which is the most important for now.

Once there is a new issue to keep track of future progress on this topic, I will post more detailed findings, and we can then see if any of the changes can be dealt with in the calls to the freetype library.

astrofrog · 2012-03-22T22:27:23Z

Final comment from me: on MacOS 10.7, all the tests pass (with freetype 2.4.9), apart from the failures in #788, but I think those must be Jenkins-specific.

mdboom · 2012-03-22T22:54:02Z

@jenshnielsen: That missing 'i' is troubling, but I wonder if it's due to mono rendering rounding error or in fact not reading the character at all. Does the 'i' appear in the corresponding SVG and/or PDF in your output?

@jenshnielsen: How are the delaunay tests failing? Font-only or something else?

@astrofrog: I've marked those 4 failing tests as needing freetype version 2.4.5 through 2.4.9. That should resolve this for now and then (barring jenshnielsen's issues) should resolve this PR.

astrofrog · 2012-03-22T23:18:34Z

All right, now everything seems fine!

Linux with freetype 2.3.9:

OK (KNOWNFAIL=11)

Linux with freetype 2.2.1:

FAILED (KNOWNFAIL=273, errors=1)

but the one error is just a gs error to do with pcolormesh, but this might just be due to the version of gs.

MacOS 10.7 with freetype 2.4.9:

FAILED (KNOWNFAIL=2, errors=2)

but these are just the inkscape errors in #787 (I think that I managed to fix the font_style errors from #788 - more later in that issue)

So this PR looks good to me!

mdboom · 2012-03-22T23:22:07Z

I missed earlier on that @jenshnielsen pointed out that test_delaunay and test_legend are not part of the standard set of tests. I usually test directly with nose at the commandline (for various reasons) so I didn't notice that. How do those test sets fare font-wise? @astrofrog: Would you mind testing those with one of your ancient freetypes?

mdboom · 2012-03-22T23:23:36Z

I'm going to mark the 4 delaunay tests @jenshnielsen as known fail for old freetypes.

astrofrog · 2012-03-22T23:24:55Z

@mdboom - how do I activate these tests?

mdboom · 2012-03-22T23:25:53Z

@astrofrog: They should be activated by commit f22547c

mdboom · 2012-03-22T23:46:49Z

The problem running tests remotely should now also be fixed.

astrofrog · 2012-03-22T23:54:02Z

Linux with freetype 2.2.1:

FAILED (KNOWNFAIL=280, errors=1)

(no inkscape, and 1 error due to a ghostscript error, which does seem like a genuine error, but I don't think it's related to this pull request, so will open a new issue for that if it persists)

Linux with freetype 2.3.9:

OK (KNOWNFAIL=16)

MacOS 10.7:

FAILED (KNOWNFAIL=2, errors=3)

(errors are inkscape as usual)

astrofrog · 2012-03-22T23:54:41Z

Looks ready to merge!

Fix failing tests on maintenance branch

astrofrog · 2012-03-22T23:56:26Z

(forgot to mention - the tests do work remotely now!)

ivanov · 2012-03-23T05:43:14Z

I just want to reiterate: EPIC EFFORT @mdboom & @astrofrog!!! job well done, I can't believe all these tests are passing now!

astrofrog · 2012-03-23T08:00:15Z

For info, I've opened a new ticket (#792) to keep track of progress with the freetype issue moving forward (though no immediate action is required - just thought I'd let you know in case you want to sign up for notifications on that issue).

This was referenced Mar 20, 2012

Failing test_mathtext tests on MacOS 10.7 #744

Closed

Failing test_axes tests on MacOS 10.7 #743

Closed

astrofrog mentioned this pull request Mar 22, 2012

Make tests faster #778

Merged

Mark 4 failing tests with old freetype versions

8f111bd

Add test_delaunay and test_legend to the set of tests.

f22547c

Mark delaunay tests as needing a specific range of freetype versions.

e27c7c4

Make tests work remotely

45f42d2

mdboom added a commit that referenced this pull request Mar 22, 2012

Merge pull request #779 from mdboom/test-triage

5846304

Fix failing tests on maintenance branch

mdboom merged commit 5846304 into matplotlib:v1.1.x Mar 22, 2012

ivanov mentioned this pull request Mar 23, 2012

v1.1.1 release candidate testing #791

Closed

astrofrog mentioned this pull request Mar 23, 2012

Make tests pass under *all* freetype versions #792

Closed

tacaswell mentioned this pull request Sep 11, 2013

FAIL: matplotlib.tests.test_axes.test_single_point.test #2412

Closed

mdboom deleted the test-triage branch March 3, 2015 18:44

Uh oh!

Fix failing tests on maintenance branch #779

Fix failing tests on maintenance branch #779

Uh oh!

Conversation

mdboom commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

WeatherGod commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

WeatherGod commented Mar 20, 2012

Uh oh!

WeatherGod commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

jdh2358 commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

mdboom commented Mar 20, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

jdh2358 commented Mar 22, 2012

Uh oh!

mdboom commented Mar 22, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

astrofrog commented Mar 22, 2012

Uh oh!

mdboom commented Mar 22, 2012

Uh oh!

astrofrog commented Mar 22, 2012