-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
os.cpu_count is problematic on sparc/solaris #73451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm attempting to build python 3.6.0 on sparc/solaris 10. After the initial configure/compile complete I ran "make test" and I see: $ make test
running build
running build_ext
(...)
running build_scripts
copying and adjusting (...)
changing mode of (...)
renaming (...)
(...)
Run tests in parallel using 258 child processes I'm fairly sure the issue stems from the fact that each core on the machine has 8 "threads" and there's 32 cores (for a total of 256 virtual cores). Each core can execute 8 parallel tasks only in very specific circumstances. It's intended for use by things like lapack/atlas where you might be doing many computations on the same set of data. Outside of these more restricted circumstances each core can only handle 2 parallel tasks (or so I gathered from the documentation), so at best this machine could handle 64 backgrounded jobs though I normally restrict my builds to the actual core count or less. The most common way to get a "realistic" core count on these machines from shell scripts is: $ core_count=`kstat -m cpu_info | grep core_id | sort -u | wc -l` ... though I'm not sure how the test suite is determining the core count. I didn't see any mention of "kstat" anywhere. |
I forgot to mention, this wasn't an issue in 3.5.1 though I never did check how many jobs it was using. I ran into other issues building that version and moved to a newer version because at least one of them (logging test race condition) was fixed after 3.5.1. |
This is odd. I just went back and re-ran 3.5.1 to see how many cores and it's having the same problem now. So, scratch that last coment. |
You don't know it, but you are actually reporting a possible sparc/Solaris specific (I think) bug against os.cpu_count. There was some discussion about how cpu_count might be problematic in this regard. It doesn't cause any real problem with the tests, though. I routinely run with -j40 on my 2 cpu test box because the test run completes faster that way due to the way many tests spend time waiting for various things. |
In my case it did because it caused enough file descriptors to be allocated that it hit the cap for max open file handles. |
As per the current documentation, We've previously hit the issue of hitting the cap on max open file handles as well and solved it by running the test suite with |
The issue title is misleading. os.cpu_count() is correct. The problem is more that the default file descriptor is too low for a machine with 256 logical CPUs ("threads") when running You can specify the number of worker processes: Is there anything to do on the Python side? Should regrtest limit the number of worker processes based on the current file descriptor limit? I have no idea how to compute a number of processes based on a maximum number of file descriptors. |
Reading it again, I see what you mean - SPARC CPUs are not able to execute all those 8 threads at the same time. That said, Intel hyperthreaded cores are also presented as 2 logical cores, and they cannot do certain operations simultaneously (although SPARC is definitely more restricted in this).
That would be very hard to do, considering that tests can then open additional files. I don't think that is really feasible when simply adjusting the |
The bare minimum which can be easily done is to emit a warning if the FD limit looks too small compared to the -jN argument. Maybe with instructions on how to fix the issue (reduce -jN value or increase FD limit). |
That is true. In our case, IIRC, we saw occasional issues when building with fd limit of 4096 with 514 worker processes. So I'd propose warning the user when the fd limit is less than 10 times the worker count? |
On Linux, the default FD limit is 1024 but I never got such issue. Well, my laptop "only" has 12 logical CPUs. Which process is impacted by the issue? The main process which spawns and manages the 514 worker processes? Does Python even need a single FD for worker process? |
Sorry for such a long delay. I looked into this and apparently it wasn't the test suite, but pgo phase when this was an issue (back when pgo was running the entire test suite). Also, it seems that back then the default limit on our build machines might have been lower. Anyway, when I change the limit a lot all the way to 512 and run the test suite, I see the following error:
which looks like the main process, but I guess that e.g. subprocess tests might fail as well? But even with limit of 512 it doesn't happen always (I guess that some tests finish very fast and free the resources before the limit is reached) |
SPARC/Solaris is unsupported per PEP-11. I'll close the bug. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: