Skip to content

Some tests require non-zero tolerance #5647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
20 tasks
mdboom opened this issue Dec 10, 2015 · 11 comments
Closed
20 tasks

Some tests require non-zero tolerance #5647

mdboom opened this issue Dec 10, 2015 · 11 comments
Labels
status: inactive Marked by the “Stale” Github Action

Comments

@mdboom
Copy link
Member

mdboom commented Dec 10, 2015

Some are producing ever-so-slightly nondeterministic results, and others demonstrate a Python 2 vs. 3 issue. While #5307 got most of these stuff, these were punted on because obvious solutions haven't yet been found.

  • test_axes_grid1:test_twin_axes_empty_and_removed
  • test_axes:test_boxplot
  • test_axes:test_boxplot_rc_parameters
  • test_axes:test_specgram_freqs
  • test_axes:test_specgram_noise
  • test_axes:test_specgram_magnitude_freqs
  • test_axes:test_hist_steplog

appveyor/windows: as part of #5922 added or increased tolerance for failures on:

  • matplotlib.tests.test_axes.test_specgram_freqs.test (RMS 0.042) (x64,35)
  • matplotlib.tests.test_axes.test_specgram_freqs.test (RMS 0.042) (x64,35)
  • matplotlib.tests.test_axes.test_specgram_magnitude_freqs.test (RMS 0.042) (x64,35)
  • matplotlib.tests.test_axes.test_specgram_magnitude_freqs.test (RMS 0.042) (x64,35)

-> currently set to 0.03, set tolerance to 0.05 on windows

  • matplotlib.tests.test_patheffects.test_collection.test (RMS 0.006) (x64,35)
  • matplotlib.tests.test_patheffects.test_collection.test (RMS 0.008) (x86,27)
  • matplotlib.tests.test_patheffects.test_collection.test (RMS 0.012) (x64,27)
  • matplotlib.tests.test_patheffects.test_collection.test (RMS 0.012) (x64,34)

This has a black diff, so up the tolerance on windows to 0.013

  • matplotlib.tests.test_patches.test_wedge_range.test (RMS 0.059) (x64,27)
  • matplotlib.tests.test_patches.test_wedge_range.test (RMS 0.059) (x64,34)
  • matplotlib.tests.test_patches.test_wedge_range.test (RMS 0.059) (x86,27)

This looks actually interesting: it seems that only the middle figure in the last row is different

  • matplotlib.tests.test_axes.test_specgram_angle_freqs.test (RMS 0.002) (x86,27)

Also looks black, but only on py27/x86...?

  • matplotlib.tests.test_triangulation.test_tri_smooth_gradient.test (RMS 0.014) (x64,35)
@jenshnielsen
Copy link
Member

I am seeing the following failures with tiny difference when running locally on my mac with the local freetype.

Some of them are above so it may just tweaking the values

======================================================================
FAIL: matplotlib.tests.test_axes.test_specgram_freqs.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_freqs.png vs. /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_freqs-expected.png (RMS 0.027)

======================================================================
FAIL: matplotlib.tests.test_axes.test_specgram_freqs.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_freqs_linear.png vs. /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_freqs_linear-expected.png (RMS 0.027)

======================================================================
FAIL: matplotlib.tests.test_axes.test_specgram_magnitude_freqs.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_magnitude_freqs.png vs. /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_magnitude_freqs-expected.png (RMS 0.027)

======================================================================
FAIL: matplotlib.tests.test_axes.test_specgram_magnitude_freqs.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_magnitude_freqs_linear.png vs. /Users/jhn/src/python/matplotlib/result_images/test_axes/specgram_magnitude_freqs_linear-expected.png (RMS 0.027)

======================================================================
FAIL: matplotlib.tests.test_patheffects.test_collection.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_patheffects/collection.png vs. /Users/jhn/src/python/matplotlib/result_images/test_patheffects/collection-expected.png (RMS 0.015)

======================================================================
FAIL: mpl_toolkits.tests.test_mplot3d.test_trisurf3d.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/Users/jhn/Envs/mplmasterpy35/lib/python3.5/site-packages/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /Users/jhn/src/python/matplotlib/result_images/test_mplot3d/trisurf3d_svg.png vs. /Users/jhn/src/python/matplotlib/result_images/test_mplot3d/trisurf3d-expected_svg.png (RMS 0.024)

----------------------------------------------------------------------

@jenshnielsen
Copy link
Member

I added a PR #5734 that adjusts the tolerance slight so that the spectrogram tests passes for me

@jenshnielsen
Copy link
Member

In addition it seems like test_collections.test__EventCollection__set_linestyle sometimes fails on Travis with a tiny RMS diff

One example this happens even in the python 3.5 job which does not use multiprocessing so it's probably not related to that

FAIL: matplotlib.tests.test_collections.test__EventCollection__set_linestyle.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/matplotlib/matplotlib/venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/testing/decorators.py", line 54, in failer
    result = f(*args, **kwargs)
  File "/home/travis/build/matplotlib/matplotlib/lib/matplotlib/testing/decorators.py", line 245, in do_test
    '(RMS %(rms).3f)'%err)
matplotlib.testing.exceptions.ImageComparisonFailure: images not close: /home/travis/build/matplotlib/matplotlib/result_images/test_collections/EventCollection_plot__set_linestyle_svg.png vs. /home/travis/build/matplotlib/matplotlib/result_images/test_collections/EventCollection_plot__set_linestyle-expected_svg.png (RMS 0.080)

@jenshnielsen
Copy link
Member

Sorry I restarted the job so the log is gone

@mdboom
Copy link
Member Author

mdboom commented Dec 29, 2015

I've also seen the *specgram* issues on #5718, and I was thinking of the same solution. I think the differences come actually different versions of Numpy's spec code, not from matplotlib itself. So I think this is reasonable unless we want to peg to a particular version of Numpy for testing (I'd like to avoid that).

I think the other tests might have a different root cause -- elsewhere I've seen this because a dictionary is being iterated over resulting in unpredictable ordering of drawing. Haven't looked into this deeply enough to determine if that's what's going on here, though.

@jankatins
Copy link
Contributor

See also #5922 for additional test failures on windows for which the tolerance had to be changed.

@jankatins
Copy link
Contributor

Added the test failures on appveyor/windows from #5922 directly to the todo list in the first comment

@jkseppan
Copy link
Member

jkseppan commented Jan 4, 2017

Is this just Python 2/3 and versions of Numpy? If so, there would probably be only a few possible correct results for each test case. We could modify the comparison tests to allow several alternatives for the correct result, and compare against each of those with zero tolerance. It would be somewhat more work to update test cases for any other changes, but we could probably just pick the images from failing Travis tests.

@dopplershift
Copy link
Contributor

#7573 should have helped deal with 2 vs. 3 issues. IMO from matplotlib's perspective in general, we're doing something wrong if test results depend on 2 vs. 3 independent of versions of other things.

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 20, 2023
@tacaswell
Copy link
Member

I am going to close this as

  1. we have (at some point) come to the conclusion that the spectragram tests are sensitive to the CPU instruction set used
  2. This is a very old list
  3. There in on-going work to support "rolling" image comparisons which will make this moot
  4. freetype changes are more pressing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

No branches or pull requests

6 participants