Skip to content

[Bug]: Possible issue with Matplotlib 3.9.1 wheel on Windows only #28551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ianthomas23 opened this issue Jul 12, 2024 · 68 comments · Fixed by howsoai/howso-engine-recipes#153
Closed
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. status: confirmed bug third-party integration: contourpy
Milestone

Comments

@ianthomas23
Copy link
Member

ianthomas23 commented Jul 12, 2024

Bug summary

Since the release of Matplotlib 3.9.1 I have been experiencing CI failures on ContourPy on Windows only that did not occur with Matplotlib 3.9.0. There is some info in my PR contourpy/contourpy#406 that works around it by pinning matplotlib<3.9.1 on Windows. I am not really sure what the problem is, but I now have a simple reproducer.

Code for reproduction

from contourpy import contour_generator
import matplotlib.pyplot as plt

print("START")
cont_gen = contour_generator(z=[[0, 1], [2, 3]])
try:
    cont_gen.filled(2.0, 1.0)
except Exception as e:
    print("EXCEPTION HANDLER", e)
print("END")

Actual outcome

If you run the above code with matplotlib 3.9.1 and contourpy 1.2.1 (the latest releases) it works OK with correct output of

START
EXCEPTION HANDLER upper_level must be larger than lower_level
END

Note the code imports some of matplotlib but does not use it, and the contourpy code here does not use it either.

If you use matplotlib 3.9.1 or nightly wheel and contourpy nightly wheel then it silently crashes for me with just

START

But if you comment out the import matplotlib... line it works OK again. Use of matplotlib 3.9.0 and any contourpy (1.2.1 or nightly wheel) is always OK.

For my installations here I am pip installing into python-only conda environments, i.e.

conda create -n temp python=3.12
conda activate temp
pip install matplotlib contourpy

Expected outcome

See above.

Additional information

There is a relevant post by @BertJorissen on the pybind11 gitter which has some in-depth analysis and leads me to believe that this is not just a ContourPy problem.

Operating system

Windows only

Matplotlib Version

>= 3.9.1

Matplotlib Backend

No response

Python version

I see same results for python 3.8 to 3.12.

Jupyter version

No response

Installation

pip

@ianthomas23
Copy link
Member Author

I suppose the first stage here is to confirm if any other Matplotlib devs can reproduce this on Windows.

@BertJorissen
Copy link

Hi Ian
my problem originated from using inheritance on an Pybind11-class, which has a function that throws some error.
I see that your function above calls SerialContourGenerator, which is a child class from the Pybind11-ContourGenerator, which will throw a std::invalid_argument-exception. It thus seems to be the same issue indeed.
The error comes from matplotlib, as both my test program and contourpy work happily next to each other, but they crash when matplotlib has been loaded (but not necessarily used).
From comparing the releases 3.9.0 and 3.9.1, there does not seem to be any problem that I can point out immediately.
I see however that the wheels are also repaired on windows. I'm not sure if this is necessary. When I turned this repairing on before, my package seemed to behave weirdly.
Best
Bert

@story645
Copy link
Member

I can't reproduce b.c I keep getting this error when trying to install nightly:

ERROR: Could not find a version that satisfies the requirement numpy>=1.23 (from contourpy) (from versions: 2.1.0.dev0)
ERROR: No matching distribution found for numpy>=1.23

@ianthomas23
Copy link
Member Author

@story645 I use something like this:

python -m pip install "matplotlib==3.9.1"
python -m pip install --only-binary=:all: --pre --upgrade --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple contourpy

to give matplotlib 3.9.1, numpy 2.0.0 and contourpy 1.3.0.dev1

@story645
Copy link
Member

Thanks @ianthomas23 and ran your code and can reproduce:

  • w/ matplotlib: crashes after START
  • w/o matplotlib: no crash

@ianthomas23
Copy link
Member Author

I've created a repo to demonstrate this at https://github.com/ianthomas23/mpl-test. If you look at the github action run https://github.com/ianthomas23/mpl-test/actions/runs/10079751371 you'll see that it uses the reproducer code from above both with and without the matplotlib import for various combinations of mpl and contourpy versions. The interesting addition is a run that builds mpl 3.9.1 from source using MSVC and this passes the tests, compared to the official wheel (built using the cibuildwheel machinery in the mpl repo) which fails the tests.

@nquetschlich
Copy link

nquetschlich commented Jul 24, 2024

I am very happy that I stumbled over this issue because since two weeks, also the CI pipeline for one of our open-source projects fails on windows without any (useful) error message (see, e.g., https://github.com/cda-tum/mqt-predictor/actions/runs/9824223458/job/27126370262).
Unfortunately I do not have the time to create a minimal working example, but downgrading Matplotlib to <v3.9.1 solved the issue (see cda-tum/mqt-predictor#258).

I just wanted to mention it, seems like we ran into the same issue as you did.

@ianthomas23 ianthomas23 added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Aug 1, 2024
@ianthomas23
Copy link
Member Author

I have labelled this "release critical" as there is evidence of 3 downstream libraries suffering crashes on Windows using Matplotlib 3.9.1, so it is not specific to ContourPy. I need help with this.

@tacaswell tacaswell added this to the v3.9.2 milestone Aug 1, 2024
@tacaswell
Copy link
Member

So to summerize:

  • 3.9.0 wheels work
  • 3.9.1 + nightly wheels do not work
  • locally built wheels work

Given that locally built wheels work I suspect that this is not a problem in Matplotlib source, but is something is the build process. My two suspicions are:

  • a delvewheel bug
  • a pybind11 bug

It looks like delvewheel had releases on April 17 (1.6.0), Jun 20 (1.7.0) and July 3 (1.7.1). We did 3.9.1 on July 4 and 3.9.0 on May 15 so that means we likely got delvewheel 1.6.0 for mpl3.9.0 and 1.7.1 for mpl3.9.1.

It looks like pybinb11 had release on Mar 27 (2.12.0), Jun 25 (2.13.0) and Jun 26 (2.13.1) which means we likely got 2.12.0 for mpl3.9.0 and 2.13.1 for mpl3.9.1. However, given that locally built wheels work (and CI has kept working), I am a bit skeptical that it is pybind11 alone. However, I do have some concern that there could be some interaction between c-extensions using pybind11 with different versions of pybind11 conflicting with each other. I have never seen any discussion of this being a concern and I trust the pybind11 devs that if this were a known issue they would make it well known, but I do not actually understand how pybind11 works well enough to to explain why it is not a possible problem so would like my assumption checked.

Open technical question:

  • do we see this problem with 3.9.1 from conda-forge?
  • if a working locally built wheel is run through delvewheel does it then break? If so, can you try multiple versions to see if that affects it.
  • does changing the version of pybind11 used locally have any affect?
  • if we put the minimal example in our test suite, do our windows builds start failing? I suspect not, but we should check (I'll do that right after I post this comment)

Open policy question:

  • should we remove the windows 3.9.1 wheels from pypi?

I'm going to put this on our meeting agenda for the call today.

@ianthomas23
Copy link
Member Author

do we see this problem with 3.9.1 from conda-forge?

No, matplotlib 3.9.1 from conda-forge and contourpy 1.3.0.dev1 nightly wheel (a conda/pip combination I wouldn't normally recommend) works for me locally.

@ianthomas23
Copy link
Member Author

Some other thoughts:

  • version of MSVC probably changed between builds of 3.9.0 and 3.9.1
  • Matplotlib is unusual in having some extensions that use pybind11 and some that don't.

tacaswell added a commit to tacaswell/matplotlib that referenced this issue Aug 1, 2024
@ksunden
Copy link
Member

ksunden commented Aug 1, 2024

The dll that is added by delvewheel is identical between mpl 3.9.0 and mpl 3.9.1, which makes it hard to point to that as the problem source... As is the patch inserted in __init__.py (aside from the function naming, which references the delvewheel version, which were indeed different (1.7.1 on mpl 3.9.1, 1.6.0 on 3.9.0))

Looking at the commits to delvewheel between the two versions indicates the addition/renaming of two cli args, which we do not use and are not turned on by default, testing/doc changes, and support for 3.13 (which is just adding the build tags to a list of available tags). None of these seem likely to me to cause differences in behavior that would be more subtle than are ruled out by checking the dll hashes.

Therefore I am reasonably convinced delvewheel is not the culprit, at least not on its own/the build chain change that directly caused the problem.

However, that is the biggest difference between the wheels as uploaded to pypi and the wheel that you build by default, so I mean that leaves some room for pybind11 (/mixing pybind11 and non-pybind11) but not a whole lot of room.

@QuLogic
Copy link
Member

QuLogic commented Aug 1, 2024

On a completely fresh Windows 10 VM with only Python 3.12 installed, trying to import contourpy gives me ImportError: DLL load failed while importing _contourpy: The specified module could not be found. This is likely to be the MSVC runtime; perhaps contourpy should run delvewheel as well so it has its own copy that won't cause these weird conflicts?

@QuLogic
Copy link
Member

QuLogic commented Aug 1, 2024

Looking at the file in Dependency Walker, it appears that the _contourpy extension is missing msvcp140.dll, so I installed the latest VC redistributable (14.40.33810.0), but then I could not reproduce the problem. Perhaps it has to do with having an older version of the redistributable already installed on your system? Or maybe it's only to do with conda's Python? I will try the latter next.

@HDembinski
Copy link

HDembinski commented Aug 1, 2024

I guess there is not much too add by saying this, but iminuit is also affected, which in turn is used by many other libraries scikit-hep/iminuit#1018
The workaround of pinning to matplotlib==3.9.0 works.

@tacaswell
Copy link
Member

tacaswell commented Aug 2, 2024

Can someone test the wheels generated by https://github.com/matplotlib/matplotlib/actions/runs/10204731582?pr=28635 and https://github.com/matplotlib/matplotlib/actions/runs/10204565050?pr=28637 ?

We earlier confirmed that pinning delvewheel back does not fix the problem.

@ianthomas23
Copy link
Member Author

On a completely fresh Windows 10 VM with only Python 3.12 installed, trying to import contourpy gives me ImportError: DLL load failed while importing _contourpy: The specified module could not be found. This is likely to be the MSVC runtime; perhaps contourpy should run delvewheel as well so it has its own copy that won't cause these weird conflicts?

This is worth investigating, but it is a distraction from the actual problem:

  1. The problem is in Matplotlib not ContourPy, so no change to ContourPy can fix this.
  2. Changes to wheel building of any downstream packages cannot fix the problem either, they need to be able to install from source (so never going near delvewheel) in the presence of the faulty Matplotlib install.

@ianthomas23
Copy link
Member Author

Can someone test the wheels generated by https://github.com/matplotlib/matplotlib/actions/runs/10204731582?pr=28635 and https://github.com/matplotlib/matplotlib/actions/runs/10204565050?pr=28637 ?

We earlier confirmed that pinning delvewheel back does not fix the problem.

It didn't fix the problem.

@ianthomas23
Copy link
Member Author

ianthomas23 commented Aug 2, 2024

There is progress to report here, but accidental. Reproducers that failed for me 15 hours ago are now passing. If you look at the last 2 GHA runs of my mpl-test repo (~15 hours ago https://github.com/ianthomas23/mpl-test/actions/runs/10195870963/job/28205423852 and ~1 hour ago https://github.com/ianthomas23/mpl-test/actions/runs/10212785471/job/28256840836) both fail when using Matplotlib 3.9.1 and ContourPy nightly wheel, but the run using both Matplotlib and ContourPy nightly wheels failed 15 hours ago and passed 1 hour ago. There is no change to the ContourPy nightly wheel in that time, but there is a new Matplotlib nightly wheel. They use different windows-2022 runners: 20240721.1.0 and 20240729.2.0. It would be good to confirm the runner version that the new Matplotlib nightly wheel was created on.

So as of now I can use the Matplotlib nightly wheel without problems on Windows, but not 3.9.1. I've run the full ContourPy test suite on this (contourpy/contourpy#413) to confirm this. If I was to exclude use of Matplotlib 3.9.1 on Windows all CI runs would pass.

What is the difference between the 3.9.1 and nightly wheels? The msvcp140.dll added by delvewheel have changed, it has gone from 621960 bytes to 618728, so it is definitely a different file.

I don't know the cause of the problem, but I suspect it relates to a problem that consumed lots of developer time a couple of months ago when the Windows github runner shipped MSVC runtimes that weren't fully back/forward compatible. There are many issues about this e.g. actions/runner-images#10055. I suspect that Matplotlib 3.9.1 was shipped after being built on one of these less-than-ideal images. The problem was supposed to be fixed but given the amount of ongoing chat about this I suspect it wasn't. Perhaps we were just unlucky in timing then, and are now lucky that the runner images have been updated.

I propose the following plan:

  1. Yank the 3.9.1 Windows wheels from PyPI to minimise the damage and loss of goodwill in downstream projects.
  2. Confirm that the new nightly wheels do work for other downstream projects.
  3. Release 3.9.2.

For item 2 for downstream projects (possibly @BertJorissen, @nquetschlich and @HDembinski) you can try out the nightly wheels instead of Matplotlib 3.9.1 by using the following in your CI runs between your build and test stages:

python -m pip install --only-binary=:all: --upgrade --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple matplotlib

@igurin-invn
Copy link
Contributor

Thank you for fixing this so quickly! I did have matplotlib>=3.9.1 because of a bug in 3.9.0.

@QuLogic
Copy link
Member

QuLogic commented Aug 8, 2024

sort out why the failure users are reporting is importing _c_internal_utils: The specified module could not be found as a) building failing on CI should have failed a lot earlier b) we should not be installing a partially built version of Matplotlib. I would believe that the issue is in our config, menson/meson-python, or pip

This is a combination of the GitHub runner environment, and security changes in Python. On the runners, the mingw compilers are available on the PATH, and so Meson defaults to using them. They link dynamically to their standard C and C++ libraries, which are in the same directory. However, as of Python 3.8, PATH is disabled for DLL searching, and so their libraries are not found. You could work around this (if building from source) by adding the mingw directory to the DLL search path. Alternatively, you could pass --vsenv to have Meson prefer MSVC. However, since installing from source is not how we expect users to install Matplotlib, this is not likely something to recommend in general.

I did manage to find and fix a few warnings while debugging this, which are in #28682.

I also prepared a job with extra debugging; from there I see we load:

  • C:\hostedtoolcache\windows\Python\3.12.4\x64\VCRUNTIME140.dll
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\_pyerror.cp312-win_amd64.pyd
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\VCRUNTIME140_1.dll
  • C:\Windows\SYSTEM32\MSVCP140.dll
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\matplotlib\_c_internal_utils.cp312-win_amd64.pyd
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\matplotlib.libs\msvcp140-cb1364a8f14ec1d3687d6faef0fd327e.dll
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\matplotlib\_path.cp312-win_amd64.pyd
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\matplotlib\ft2font.cp312-win_amd64.pyd
  • C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\site-packages\kiwisolver\_cext.cp312-win_amd64.pyd

And surprisingly, this is where we crash, sometime after locating PyInit__cext from kiwisolver. Unfortunately, the debugger does not stop properly and I couldn't find how to easily print a traceback automatically.

@Sedeniono
Copy link

Sedeniono commented Aug 8, 2024

In case it helps, I originally debugged the crashes on the GitHub runners due to the Microsoft STL update in my own application (unrelated to Python/Matplotlib) by creating crash dumps via procmon and uploading them as artifacts. See here for an example. The crash dumps can be opened in Visual Studio or WinDbg.

@QuLogic
Copy link
Member

QuLogic commented Aug 13, 2024

3.9.2 is tagged now, and I confirmed that the wheels worked on Windows again via @BertJorissen's workflow, so pushed them to PyPI. As #28687 was also confirmed to work by @ianthomas23, I'm to close this issue, hopefully for good this time.

@jfuruness
Copy link

On github actions, on windows, using pypy3 and matplotlib 3.9.2, I get what seems to be the same error. Oddly though, I only get the error on certain repos of mine that use matplotlib, and not others:

  .tox\pypy3\lib\site-packages\matplotlib\__init__.py:159: in <module>
      from . import _api, _version, cbook, _docstring, rcsetup
  .tox\pypy3\lib\site-packages\matplotlib\cbook.py:32: in <module>
      from matplotlib import _api, _c_internal_utils
  E   ImportError: The specified module could not be found

@tacaswell
Copy link
Member

@jfuruness I suspect your issues is that you are using pypy310 which we do not provide wheels for (yet) and something is going wrong with your local build. Please see the install docs on building from source on windows.

@jfuruness
Copy link

@tacaswell Ah I didn't realize, thank you for the explanation. Looking at a few of the past prior releases I don't see wheels for pypy310, is there a plan to add these in the future (and perhaps a gh issue I could follow)?

@QuLogic
Copy link
Member

QuLogic commented Sep 13, 2024

We were waiting for NumPy wheels numpy/numpy#24728; it looks like those are out, but they didn't close the issue, so I wasn't notified.

@jfuruness
Copy link

@QuLogic Ah makes sense, sounds good! Thank you both for you support of matplotlib!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. status: confirmed bug third-party integration: contourpy
Projects
None yet
Development

Successfully merging a pull request may close this issue.