-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
possible race condition stops build with python3.4 #3738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This may be due to trying to import PyQt4 after PyQt5. You can control which GUI backends are checked using the setup.cfg file. Try adding |
Since this is a timeout I guess it is likely to be using the other code path and forking a process Perhaps we should change |
Good point, @jenshnielsen: I missed that multiprocessing path. Yes, I think having a timeout (and ideally getting a traceback from the child process) would be better. |
yeah I was looking suspiciously ay the mp code too, but I just had a glance and didnt want to point fingers :) if you come up with a patch to test, gimme a shout and I will try to upload the new package asap. |
@sandrotosi What is the deadline for the Jessie freeze? |
@tacaswell ehm... a week ago :) but I will try to get an unblock for mpl |
@sandrotosi Do you have any changes to test the fixes in #3741 This puts a 5 second timeout on the get return of the result. But it might still wait at the p.close() stage so alternatively we might have to call p.terminate() if a timeout error is raised? |
@jenshnielsen my "test" would be to upload it to Debian and see if it really fixes the problem, but since it requires a bit of time on my side and quite a lot of resources on the Debian build machine I would like not to rush on fixing it (even if it's important) and get an answer to your last question |
@jenshnielsen I bet this is the same issue we were having with 3.2, could you reproduce that on any of your machines? |
@sandrotosi Sounds fair. The problem is that I can't reproduce the issue so I am working in the blind. @tacaswell No I can't reproduce it right now. Good point about 3.2 I will try testing it on 3.2 in a Ubuntu 12.04 VM. That is probably the closest I can get to the travis issue locally. |
You can also turn 3.2 back on on travis On Wed Oct 29 2014 at 6:07:02 PM Jens Hedegaard Nielsen <
|
@jenshnielsen me neither has ever faced this problem on my amd64 box, just spotted on our build nodes. if you're confident that's the way to go, I will upload it and see the results |
@sandrotosi Let me do a few more experiments with VM's and Travis and see if I can reproduce the issue. |
I could not reproduce the issue (neither locally or on travis) but I have changed the code a bit to forcefully terminate if the process has not returned within 5 sec. With my current understanding of multiprocessing I think this is the right thing to do but I could be wrong. |
To my surprise, I found a couple[1][2] of patches that already disable multiprocessing checks in Debian for GTK3Agg and GTK3Cairo (so I adapted a bit the proposed patch) - if the test with QT is succesful we might have also fixed the same problems with GTK3* (hence dropping a bit of delta between vanilla mpl and Debian release) During the night (EU TZ) will build the package on my machine and tomorrow morning will upload the results hopefully during the day we will know the outcome |
Great. It would be good to get to the bottom of this. |
BTW: I did have a similar issue with the GTK3 backend on OSX and homebrew GTK. This seemed to be caused by a broken pygobject build which would segfault the subprocess. |
package built fine on my machine, uploaded to Debian, and its now building on all the relevant architectures: I'll monitor it and let you know of any developments |
while s390x succeeded, kfreebsd-i386 didnt :( https://buildd.debian.org/status/package.php?p=matplotlib&suite=sid |
Sorry I messed up the fix that I wrote yesterday night. The new version should function correctly. If the sub process runs for more than 10 sec it will be killed and the backend will be skipped. This is still not optimal for the gtk backends since they will not be build. It doesn't matter for the QT backends since these are runtime only dependencies. |
@jenshnielsen where can I find the new version? anyway, incredibly enough, as I was about to try to setup a loop to see when it will stuck, at the first iteration I replicated the problem on my laptop: what tests do you want me to perform? of course i run a simple "python3.4 setup.py build"... |
I have pushed a new commit on the pull request which should terminate the right way |
ok, I'm doing the same dance as yesterday and let you know after the buildd will pick up the new upload |
Good news!!! most of the architectures have finished building succesfully \o/ https://buildd.debian.org/status/package.php?p=matplotlib I'll keep an eye on the others but i dont expect (hopefully...) surprises - THANKS A LOT for the quick support on this |
closed by #3741 |
Sorry for commenting on an old bug, but I am trying to install matplotlib-2.2.4 for Python 3.6 and 3.7 on a Gentoo system. It had worked previously, but I had to rebuild it because I wanted to remove Python 2.7 support. So I had it installed for Python 2.7, 3.6 and 3.7. Then I decided to throw away 2.7. This time compilation for Python 3.6 went smooth, but for 3.7 it could not find gtk3agg due to 'Check timed out'. I thought it was a 3.7 problem, so I disabled 3.7 too. Now it cannot find gtk3agg for 3.6 due to the same timeout reason. I decided to re-enable building for both Python 3.6 and 3.7 again - and again, gtk3agg is built for 3.6 but not for 3.7 due to timeout. Now, a word about the load on the system while this is going on: the load is anywhere between 17 and 44(!). Yes, fortyfour. Who cares? There are 8 virtual CPUs at 3GHz each and 50 GB of RAM waiting to be used - so I use them. Since when has the load on a system any effect on the result of a test like 'do you have GTK3 for Python installed?'? As you can see, just setting a 5 sec. timeout on a test will not cut it. The result is clearly dependent on the load - and you should not impose a maximum load on your users. What you can do is possibly inquire about the current load and set a timeout that depends on it. That is, if you would set a timeout of 5 sec. for a load of 0.2, then how high should it be for a load of 17 - or even 44? You have to think more deeply about this... |
We can not do the checks in-process due to conflicts of importing more than one GUI toolkit into the same process, so we do it in a sub-process. If we do it without a timeout we run the risk of hanging the build forever if things go extremely sideways (as this issue reported). This seems like a rather extreme edge case where you have loaded your system to the point where it is essentially non-responsive. I am disinclined to pick up the complexity of a load-dependent timeout for this case, but if you open a PR we can discuss how what the complexity trade off actually looks like. The fastest path forward for you is to inject a patch into the gentoo build process (https://wiki.gentoo.org/wiki//etc/portage/patches I think that this falls under the category of "site-specific patches") that removes the timeout. It is between you and the gentoo packagers if they take that upstream. In the future please open a new issue that refers to old ones rather than commenting on old ones. As a side-point, I strongly suspect that you are actually taking longer in wall time to get things done (due to the cost of context switching / cache misses etc) by over loading your system that way. |
Hello,
we noticed a problem in Debian buildd machines, where sometimes the build of mpl with python3.4 stuck at:
(which got killed after 150 mins).
For now this has happened on kfreebsd-i386[1], s390x[2], i386[3] from early 1.4.1RC releases[4](the 'maybe-failed' entries)
[1] https://buildd.debian.org/status/fetch.php?pkg=matplotlib&arch=kfreebsd-i386&ver=1.4.2-1&stamp=1414402804
[2] https://buildd.debian.org/status/fetch.php?pkg=matplotlib&arch=s390x&ver=1.4.2-1&stamp=1414192251
[3] https://buildd.debian.org/status/fetch.php?pkg=matplotlib&arch=i386&ver=1.4.1~rc1-1&stamp=1413455146
[4] https://buildd.debian.org/status/logs.php?pkg=matplotlib
this is currently prevent matplotlib to be build on all Debian release architectures and thus reaching testing and Jessie for the freeze.
The text was updated successfully, but these errors were encountered: