Skip to content

TST: Lower tolerance of mathtext tests, update baseline images to master #5305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

zblz
Copy link
Member

@zblz zblz commented Oct 23, 2015

As mentioned in #4887, #5301, a lot of the mathtext tests are only passing because the tolerance is set quite high. In a couple of recent PRs only the failing images were replaced in the baseline even though a few other were modified by the PR and were not failing the tests because of the tolerance.

In this PR the tolerance is lowered to 1%, and the baseline images updated to the master rendering for the following mathtext tests (comment from #5301):

  • 00: baseline image is wrong, symbols are not properly spaced.
  • 07: baseline image is wrong, symbols are not properly spaced.
  • 09: spacing is wildly different, I imagine the baseline is wrong since you recently did a PR on mathtext spacing to match TeX?
  • 18: baseline is wrong (symbol spacing)
  • 19: idem
  • 22: idem
  • 25: idem
  • 26: accents are shifted, the baseline looks worse to me
  • 28: baseline is wrong (symbol spacing)
  • 33: idem
  • 35: idem
  • 38: idem
  • 39: accent location, same as 26
  • 44: baseline is wrong (symbol spacing)
  • 45: idem
  • 47: idem
  • 48: idem
  • 52: idem
  • 60: idem
  • 63: idem
  • 64: another spacing issue with \, as in 09. baseline looks wrong to me: 2 \, a should render to a very small space, and the baseline spacing is pretty big.
  • 65: spacing difference after ! (baseline has no space, master has it as per the _punctuation_symbols spacing)
  • 67: baseline is wrong: symbol spacing and spacing after comma
  • 69: accent position, as in 26, 39
  • 71: different spacing between symbols
  • 72: idem

@zblz zblz changed the title TST: Lower toleance of mathtext tests, update baseline images to master TST: Lower tolerance of mathtext tests, update baseline images to master Oct 23, 2015
@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

@zblz: What platform did you generate the pngs on? The testing is very sensitive to versions and settings in freetype, and if it doesn't match what Travis is doing, there will be problems. For example, if I generate images on a Mac, they are completely unsuitable. All this is the basis for doing imperfect comparisons...

Also a heads up -- when the default styles change in 2.0, we will probably do a mass regeneration of all test images. It's probably fine to do just these math ones now, but of course when the default math font changes they will need to be done again.

@zblz
Copy link
Member Author

zblz commented Oct 23, 2015

I'm using Debian testing, and wanted to do this PR mostly to check how Travis would behave with respect to the images I generated. I see now that there a lot of failures, some even at a 20% level. A 1% threshold was definitely optimistic, but I set it because a lot of the accent tests were failing at the 4-5% level, so we should have a method to catch those. Could we use the images generated by travis as baseline in the repo? Are they stored somewhere we can access? However, that might mean having to update them every time there are changes in the Travis environment, which is not ideal as it is not something we can control completely.

Regarding the default mathtext fontset, the dejavusans and dejavuserif fontsets added in #5214 have their own set of tests, so these here would not need to be changed when the default is changed.

@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

Could we use the images generated by travis as baseline in the repo? Are they stored somewhere we can access?

That's a good idea. You can get the images from Travis-CI -- they are uploaded to an Amazon S3 account -- but only when building a branch in the main matplotlib repo, not when building a random pull request. This is for security reasons and more-or-less enforced by the Travis security model.

Since you have commit rights, you can push your branch to the matplotlib repo with something like:

git push upstream test-tol zblz-staging-tmp

Travis should build that branch and provide a URL to a downloadable tarball of images at the bottom of the output.

My long term plan for fixing this -- and maybe I'll even get to this today -- is to add a build configuration option that will download and build against a specific version of freetype, with particular settings. The idea is that all matplotlib developers and our .travis.yml would turn this on and should get identical results. Then we can turn the tolerance all the way down to zero. That's the idea anyway. This problem has gone on a long time, and we're finding more and more cases where things are being missed due to the thresholding.

Regarding the default mathtext fontset, the dejavusans and dejavuserif fontsets added in #5214 have their own set of tests, so these here would not need to be changed when the default is changed.

Oh, of course that's right. No risk of duplicating effort here then. There is some math scattered here and there throughout the test suite that will be affected, but that's much less stuff.

@zblz
Copy link
Member Author

zblz commented Oct 23, 2015

I just checked and Travis used freetype 2.4.8 whereas I have 2.6.0 installed... Having the same freetype everywhere would definitely help here. If you can add this option in the near future we should wait to replace the images with a self-built freetype, otherwise I will replace the images here with the Travis-generated ones.

@jenshnielsen
Copy link
Member

We might consider switching Travis over to the "new" Ubuntu 14.04 Trusty images which have a more resent version of freetype and TeX which is hopefully closer to what we are using for development. Update 14.04 has 2.5.2 where as 12.04 has 2.4.8 https://launchpad.net/ubuntu/%2Bsource/freetype

@WeatherGod
Copy link
Member

re: 2.0 style changes
@mdboom, the test suite have been set up to apply the "classic" style by
default to all plots for image comparisons, so we don't have to do a mass
regeneration of baseline images. Instead, we can make sure that all new
tests and baseline images use the "default" style explicitly, and we can
update certain baseline images as we feel necessary.

The problem I have with a mass regeneration of baseline images is that it
would be extremely difficult to properly verify that each and every single
image is correct. Especially with a brand new style that we aren't totally
familiar with yet.

On Fri, Oct 23, 2015 at 9:15 AM, Jens Hedegaard Nielsen <
notifications@github.com> wrote:

We might consider switching Travis over to the "new" Ubuntu 14.04 Trusty
images
http://blog.travis-ci.com/2015-10-14-opening-up-ubuntu-trusty-beta
which have a more resent version of freetype and TeX which is hopefully
closer to what we are using for development.


Reply to this email directly or view it on GitHub
#5305 (comment)
.

@zblz
Copy link
Member Author

zblz commented Oct 23, 2015

@jenshnielsen -- in the last commit I changed .travis.yml to use the trusty images and it the images generated in Travis still vary significantly from my images (freetype 2.5.2 vs 2.6.0). However, an advantage of using the dist: trusty keyword is that the freetype version will be stable over time. If we use the baseline images from the Travis-generated images, we can reduce the tolerance and be more confident that the freetype version will not changer arbitrarily over time unless we change the dist keyword (or they discontinue the trusty images).

@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

I think the predictable freetype version is a pretty high priority, so I plan to start on it today (though may not finish, and who knows what surprises it will bring). I think we might as well hold off on merging this until that's in.

I like the idea of pegging the Travis environment. I didn't realize one could do that. I think it's still worthwhile doing a local freetype build so that developers can run the tests on their own machines without requiring a specific version of the OS etc.

@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

@WeatherGod: Good point about the classic style. That makes sense.

@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

And just a note for completeness: The images depend not just on the version of freetype, but how it was configured -- whether bytecode hinter or not etc. Different Linux distros build it in different ways, even if they have the same exact version.

@mdboom
Copy link
Member

mdboom commented Oct 23, 2015

See #5306, which should now (fingers crossed) be passing...

@zblz
Copy link
Member Author

zblz commented Oct 27, 2015

This PR is now redundant as #5306 is definitely a better long term option. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants