TST: Lower tolerance of mathtext tests, update baseline images to master #5305

zblz · 2015-10-23T11:07:51Z

As mentioned in #4887, #5301, a lot of the mathtext tests are only passing because the tolerance is set quite high. In a couple of recent PRs only the failing images were replaced in the baseline even though a few other were modified by the PR and were not failing the tests because of the tolerance.

In this PR the tolerance is lowered to 1%, and the baseline images updated to the master rendering for the following mathtext tests (comment from #5301):

00: baseline image is wrong, symbols are not properly spaced.
07: baseline image is wrong, symbols are not properly spaced.
09: spacing is wildly different, I imagine the baseline is wrong since you recently did a PR on mathtext spacing to match TeX?
18: baseline is wrong (symbol spacing)
19: idem
22: idem
25: idem
26: accents are shifted, the baseline looks worse to me
28: baseline is wrong (symbol spacing)
33: idem
35: idem
38: idem
39: accent location, same as 26
44: baseline is wrong (symbol spacing)
45: idem
47: idem
48: idem
52: idem
60: idem
63: idem
64: another spacing issue with \, as in 09. baseline looks wrong to me: 2 \, a should render to a very small space, and the baseline spacing is pretty big.
65: spacing difference after ! (baseline has no space, master has it as per the _punctuation_symbols spacing)
67: baseline is wrong: symbol spacing and spacing after comma
69: accent position, as in 26, 39
71: different spacing between symbols
72: idem

mdboom · 2015-10-23T11:43:54Z

@zblz: What platform did you generate the pngs on? The testing is very sensitive to versions and settings in freetype, and if it doesn't match what Travis is doing, there will be problems. For example, if I generate images on a Mac, they are completely unsuitable. All this is the basis for doing imperfect comparisons...

Also a heads up -- when the default styles change in 2.0, we will probably do a mass regeneration of all test images. It's probably fine to do just these math ones now, but of course when the default math font changes they will need to be done again.

zblz · 2015-10-23T11:54:12Z

I'm using Debian testing, and wanted to do this PR mostly to check how Travis would behave with respect to the images I generated. I see now that there a lot of failures, some even at a 20% level. A 1% threshold was definitely optimistic, but I set it because a lot of the accent tests were failing at the 4-5% level, so we should have a method to catch those. Could we use the images generated by travis as baseline in the repo? Are they stored somewhere we can access? However, that might mean having to update them every time there are changes in the Travis environment, which is not ideal as it is not something we can control completely.

Regarding the default mathtext fontset, the dejavusans and dejavuserif fontsets added in #5214 have their own set of tests, so these here would not need to be changed when the default is changed.

mdboom · 2015-10-23T12:04:21Z

Could we use the images generated by travis as baseline in the repo? Are they stored somewhere we can access?

That's a good idea. You can get the images from Travis-CI -- they are uploaded to an Amazon S3 account -- but only when building a branch in the main matplotlib repo, not when building a random pull request. This is for security reasons and more-or-less enforced by the Travis security model.

Since you have commit rights, you can push your branch to the matplotlib repo with something like:

git push upstream test-tol zblz-staging-tmp

Travis should build that branch and provide a URL to a downloadable tarball of images at the bottom of the output.

My long term plan for fixing this -- and maybe I'll even get to this today -- is to add a build configuration option that will download and build against a specific version of freetype, with particular settings. The idea is that all matplotlib developers and our .travis.yml would turn this on and should get identical results. Then we can turn the tolerance all the way down to zero. That's the idea anyway. This problem has gone on a long time, and we're finding more and more cases where things are being missed due to the thresholding.

Regarding the default mathtext fontset, the dejavusans and dejavuserif fontsets added in #5214 have their own set of tests, so these here would not need to be changed when the default is changed.

Oh, of course that's right. No risk of duplicating effort here then. There is some math scattered here and there throughout the test suite that will be affected, but that's much less stuff.

zblz · 2015-10-23T13:15:17Z

I just checked and Travis used freetype 2.4.8 whereas I have 2.6.0 installed... Having the same freetype everywhere would definitely help here. If you can add this option in the near future we should wait to replace the images with a self-built freetype, otherwise I will replace the images here with the Travis-generated ones.

jenshnielsen · 2015-10-23T13:15:26Z

We might consider switching Travis over to the "new" Ubuntu 14.04 Trusty images which have a more resent version of freetype and TeX which is hopefully closer to what we are using for development. Update 14.04 has 2.5.2 where as 12.04 has 2.4.8 https://launchpad.net/ubuntu/%2Bsource/freetype

WeatherGod · 2015-10-23T13:30:41Z

re: 2.0 style changes
@mdboom, the test suite have been set up to apply the "classic" style by
default to all plots for image comparisons, so we don't have to do a mass
regeneration of baseline images. Instead, we can make sure that all new
tests and baseline images use the "default" style explicitly, and we can
update certain baseline images as we feel necessary.

The problem I have with a mass regeneration of baseline images is that it
would be extremely difficult to properly verify that each and every single
image is correct. Especially with a brand new style that we aren't totally
familiar with yet.

On Fri, Oct 23, 2015 at 9:15 AM, Jens Hedegaard Nielsen <
notifications@github.com> wrote:

We might consider switching Travis over to the "new" Ubuntu 14.04 Trusty
images
http://blog.travis-ci.com/2015-10-14-opening-up-ubuntu-trusty-beta
which have a more resent version of freetype and TeX which is hopefully
closer to what we are using for development.

—
Reply to this email directly or view it on GitHub
#5305 (comment)
.

zblz · 2015-10-23T14:13:21Z

@jenshnielsen -- in the last commit I changed .travis.yml to use the trusty images and it the images generated in Travis still vary significantly from my images (freetype 2.5.2 vs 2.6.0). However, an advantage of using the dist: trusty keyword is that the freetype version will be stable over time. If we use the baseline images from the Travis-generated images, we can reduce the tolerance and be more confident that the freetype version will not changer arbitrarily over time unless we change the dist keyword (or they discontinue the trusty images).

mdboom · 2015-10-23T14:31:51Z

I think the predictable freetype version is a pretty high priority, so I plan to start on it today (though may not finish, and who knows what surprises it will bring). I think we might as well hold off on merging this until that's in.

I like the idea of pegging the Travis environment. I didn't realize one could do that. I think it's still worthwhile doing a local freetype build so that developers can run the tests on their own machines without requiring a specific version of the OS etc.

mdboom · 2015-10-23T14:36:34Z

@WeatherGod: Good point about the classic style. That makes sense.

mdboom · 2015-10-23T14:53:37Z

And just a note for completeness: The images depend not just on the version of freetype, but how it was configured -- whether bytecode hinter or not etc. Different Linux distros build it in different ways, even if they have the same exact version.

mdboom · 2015-10-23T19:00:54Z

See #5306, which should now (fingers crossed) be passing...

zblz · 2015-10-27T12:43:17Z

This PR is now redundant as #5306 is definitely a better long term option. Closing.

update baseline images, lower tolerance of test

115fc87

tacaswell added the status: needs review label Oct 23, 2015

zblz changed the title ~~TST: Lower toleance of mathtext tests, update baseline images to master~~ TST: Lower tolerance of mathtext tests, update baseline images to master Oct 23, 2015

try travis with trusty images

9e8269a

mdboom mentioned this pull request Oct 27, 2015

Use a specific version of Freetype for testing #5306

Merged

zblz closed this Oct 27, 2015

tacaswell removed the status: needs review label Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST: Lower tolerance of mathtext tests, update baseline images to master #5305

TST: Lower tolerance of mathtext tests, update baseline images to master #5305

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

jenshnielsen commented Oct 23, 2015

Uh oh!

WeatherGod commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 27, 2015

Uh oh!

Uh oh!

Uh oh!

TST: Lower tolerance of mathtext tests, update baseline images to master #5305

TST: Lower tolerance of mathtext tests, update baseline images to master #5305

Uh oh!

Conversation

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

jenshnielsen commented Oct 23, 2015

Uh oh!

WeatherGod commented Oct 23, 2015

Uh oh!

zblz commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

mdboom commented Oct 23, 2015

Uh oh!

zblz commented Oct 27, 2015

Uh oh!

Uh oh!