-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix failing tests on maintenance branch #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just accidentally ran the tests on a remote box w/ no X11 connection and got a ton of failures because the test import triggered my backend import (which was GTKAgg). The traceback is below. The question is: would it do any harm to call:
as the first line of matplotlib.test, eg before importing nose here: https://github.com/matplotlib/matplotlib/blob/v1.1.x/lib/matplotlib/__init__.py#L995 Michael, if you think this is a good idea maybe we should just tack it on to this PR here is the traceback:
|
@jdh2358: Sure -- that works for me. Added to this PR. |
I also had issues where the tests would try and open many interactive windows, so I'd be in favor of explicitly setting the backend for the tests to 'Agg'. |
I'm still getting 50 test failures and I haven't checked all of them but the ones I've spot checked appear to be the dreaded font differences. This can be ignored for now, but we need to put our heads together and come up with a system where we are not overriding each other's baseline images because of local configuration differences. One approach would be to get some racked hosting and give all developers access to a common build environment for testing. Another would be to standardize on a freetype version and config. |
@jdh2358: I'd like to experiment with this font issue a little bit. Can you send me one image with lots of mismatched text? I'm going to see if I can somehow reproduce your output by playing with Freetype's various flags. |
@mdboom - I'm getting lots of test failures because PIL isn't installed, but I wasn't getting this with the master branch, because I think the PIL dependence was removed in the mean time. I'll create a new environment with PIL to test this pull request. |
@jdh2358 - regarding font differences, I like the option to explicitly state that the tests should pass with a specific version of freetype. If you know which versions of freetype cause issues, one could even mark the relevant tests as known fails (for those 'bad' freetype versions). This requires knowing the freetype version in the tests though. |
@astrofrog: Yes, PIL is required for 1.1.x, but not for master. The problem with relying on a particular version of freetype is that it affects virtually every test -- we'll have a basically useless |
I've tested the branch in this PR with various Python and Numpy versions, and it does fix some of the test_axes tests:
There are still some tests failing, but looking at the diff images, this seems to be due to fonts (the residuals for the other test_axes tests now don't have a frame). The log for the build/install/tests is here: http://db.tt/iNBIScaA The new test images including residuals is here: http://db.tt/tvbk02A0 So the bottom line is that this is an improvement (67 instead of 77 failures), and I think most of the remaining issues are font-based (but if you look at the residuals, you'll see some exceptions). I agree that it would be better to figure out how to force freetype to render a certain way, rather than requiring a specific version. |
I'm not finding anything useful to fix the freetype version difference problem. There are some other more blunt options, like converting text to rectangles when in test mode or something, but that brings its own set of problems. I'd still lean toward merging this before the release, despite its problems, with a note that the baseline images were generated with freetype 13.1.7 and any other versions may cause problems with the tests. |
Yes, I'm in favor of merging too rather than worrying about this long-standing issue right now. This is something we can work on ahead of the next release. @astrofrog, could you tell us which test failures you are seeing that don't appear font related? |
I agree that this should be merged without waiting for a font-related fix. If we know which freetype version should work, we can at least set up the continuous integration with that version. I'll try and do that on my side. Tests that look like they are not just font related:
The images are all in the tar file I linked to previously. |
I can confirm that canonical, shaped_data, font_styles and single_point have changed (I am not seeing any images for test_basic_annotate at all). They did not trigger an error on my system because the diff was too small. They look like the difference is from #695 because there is a 1 pixel shift in the entire axes bounding box and tick markers. My suggestion is to add the new baseline image for these tests after you have had a chance to confirm that the new look correct to you Michael. |
The version of freetype I'm using to generate the images is 13.1.7. The version John is using (which has different results) is 12.2.6. Perhaps anything in the 13 series is close enough... I'm not sure how to determine except by experimentation. This is messy. In this pull request, I only updated the images that were failing for me -- which were primarily related to the frame snapping offset differences. There are some bonafide frame snapping offset differences still in there, but they don't cause the threshold to be tripped unless, as on @astrofrog's machine, a different version of freetype is being used. I'm thinking the best thing to do may be to update all the pngs to my output set (I've already manually verified that everything is close enough to correct). It would then be nice to lower the threshold to be more sensitive to changes once we can assume that freetype differences don't need to be accounted for. Better yet, to not have a threshold but to expect a perfect match would be even better... |
Agree on pushing all of your result images as the new baseline since you have manually verified them and the diffs we are seeing are wither font related or pixel shifting changes where the new result images are the correct ones. |
I had freetype 14.1.8. I'm just setting up my Jenkins environment with 13.1.7 so I can get rid of the font errors and hopefully be able to focus on the real failures. I should be able to report back in a couple of hours. By the way, out of interest, what are all the known failures? Are they all font-related? |
Side note: did you see that Github allows you to show the actual difference image between the old and new images in your commit? (default is just side by side, but difference is an option) Very cool! |
I performed the tests on Linux with freetype 13.1.7 (before commit 152fb09), and I had two Known fails and 1 other failure. The failure was with test_axes.test_pcolormesh with the svg output. |
Is there any way that the libpng version could affect things? Which version of libpng are you using? I tried running the tests building Matplotlib with freetype 13.1.7 (instead of 14.1.8), and I'm still getting the same label-related failures. It almost looks more like an anti-aliasing difference than differences in actual font/characters. In most cases it seems the characters are the same, but the differences are in the faint anti-aliased part of the font, so the difference images show a 'fuzz' around each character. |
Looks like the baseline image for pcolormesh_svg.png is missing the gourond shading example (it is blank) meanwhile, the generated gourond shading for svg does not look as smooth as it is for the png output |
obviously, I can't spell.... gouraud shading |
By the way, is it normal there are so many known failures on MacOS X?
(this is with freetype 13.1.7) |
@astrofrog , you have so many known fails because you don't have the requirements to test one or more of the PDF, PS or SVG backends. the test infrastructure will check for a dependency for a given image type, and if it is not there, register it as a knownfail. http://matplotlib.sourceforge.net/devel/coding_guide.html#running-the-tests |
@jdh2358 - ah, that makes sense, thanks! I'm running these in a clean environment, so I'll try and add the requirements for the other tests. Regarding the font issue, I'm fairly convinced it's an anti-aliasing issue (with the fonts) - is there any way that freetype on Mac might include a different anti-aliasing algorithm than on Linux? |
@WeatherGod: I neglected to include the fixed As for differences in freetype, it may be that it is configured differently. This is the Seeing how that compares with your See also this: http://www.freetype.org/freetype2/docs/ft2faq.html#builds-differences |
My ftconfig.h file is pretty long... http://db.tt/9OPuGJeH Though according to the second link you sent - maybe we should compare ftoption.h? Mine is: http://db.tt/6Jo7a8wH |
Ah, indeed. The correct file to compare is |
Hmmm... and they're identical. Perhaps there is some sort of gamma correction or colorspace conversion going on the PNG reading/writing. (It should otherwise be lossless). As an experiment -- can you run the If there are no differences here, but there are when saved as png we can rule out freetype and point the finger at png (though it seems like a longshot). |
Ok, so I did some further tests with freetype_test, and it seems some other characters do have changes in the monochromatic version at other versions of freetype. For example, character 'y' (which I think is causing some of the above failures) changes at 2.3.0, 2.3.9, 2.4.4, and 2.4.5. This is why we are still not getting consistent results across all versions. |
Anyway, it seems like the current pull request already provides significant improvement compared to before, so maybe we should just merge this (after marking the above tests as known fails) for this candidate release, and we can do a more exhaustive test of freetype varying the characters to see where all the transitions are, and whether we can fix any of them? |
This is the last issue to close before I burn the 1.1.1 release candidate and put a tarball up for building and testing, so Michael, fire at will when you are ready to merge this sucker. If you want to wait another day and try stamping out the remaining brush fires, I have no problem delaying. |
I can hopefully polish this a little more tonight... save the bigger ideas for another time. |
@mdboom - should we start a new issue to keep track of more findings? I've done some more research into which freetype versions are causing issues, but I don't want to continue flooding this PR with information at this stage. |
I won't go into much detail here, but basically 2.4.5 is the freetype version where the most characters changed. Other versions where things change are 2.3.0, 2.3.10, 2.4.0, 2.4.4, and 2.4.5. But the bottom line of my testing is that things are stable from 2.4.5 onwards (for monochromatic font rendering), which is the most important for now. Once there is a new issue to keep track of future progress on this topic, I will post more detailed findings, and we can then see if any of the changes can be dealt with in the calls to the freetype library. |
Final comment from me: on MacOS 10.7, all the tests pass (with freetype 2.4.9), apart from the failures in #788, but I think those must be Jenkins-specific. |
@jenshnielsen: That missing 'i' is troubling, but I wonder if it's due to mono rendering rounding error or in fact not reading the character at all. Does the 'i' appear in the corresponding SVG and/or PDF in your output? @jenshnielsen: How are the delaunay tests failing? Font-only or something else? @astrofrog: I've marked those 4 failing tests as needing freetype version 2.4.5 through 2.4.9. That should resolve this for now and then (barring jenshnielsen's issues) should resolve this PR. |
All right, now everything seems fine! Linux with freetype 2.3.9:
Linux with freetype 2.2.1:
but the one error is just a gs error to do with pcolormesh, but this might just be due to the version of gs. MacOS 10.7 with freetype 2.4.9:
but these are just the inkscape errors in #787 (I think that I managed to fix the font_style errors from #788 - more later in that issue) So this PR looks good to me! |
I missed earlier on that @jenshnielsen pointed out that test_delaunay and test_legend are not part of the standard set of tests. I usually test directly with nose at the commandline (for various reasons) so I didn't notice that. How do those test sets fare font-wise? @astrofrog: Would you mind testing those with one of your ancient freetypes? |
I'm going to mark the 4 delaunay tests @jenshnielsen as known fail for old freetypes. |
@mdboom - how do I activate these tests? |
@astrofrog: They should be activated by commit f22547c |
The problem running tests remotely should now also be fixed. |
Linux with freetype 2.2.1:
(no inkscape, and 1 error due to a ghostscript error, which does seem like a genuine error, but I don't think it's related to this pull request, so will open a new issue for that if it persists) Linux with freetype 2.3.9:
MacOS 10.7:
(errors are inkscape as usual) |
Looks ready to merge! |
Fix failing tests on maintenance branch
(forgot to mention - the tests do work remotely now!) |
I just want to reiterate: EPIC EFFORT @mdboom & @astrofrog!!! job well done, I can't believe all these tests are passing now! |
For info, I've opened a new ticket (#792) to keep track of progress with the freetype issue moving forward (though no immediate action is required - just thought I'd let you know in case you want to sign up for notifications on that issue). |
Fix failing tests -- many of them were broken due to changes in the snapping algorithm that made the axes off by one pixel. Others were broken due to changes in text alignment. Also fixes a bug in the SVG backend's Gouraud shading.