Skip to content

WIP, TST: bump up CPU usage on shippable nodes #11915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tylerjereddy
Copy link
Contributor

The shippable team noted that although the pool of nodes is shared, once allocated for a job the node is exclusively assigned so we can use all the cores--trying to bump up usage here--will remove WIP tag if it helps.

I ran "lscpu" a week or two ago in one of the jobs and the number was quite large -- can't remember how much of that is physical vs. logical, but worth trying to bump up. I think previous small number testing suggested that it can help for testing more than building, but let's see.

@tylerjereddy
Copy link
Contributor Author

Doesn't seem to help with auto selection of cores for pytest selecting 2 -- maybe will try one more thing tomorrow; building is no faster.

@tylerjereddy
Copy link
Contributor Author

The overall ARM CI runs are getting faster here, but the test_suppress_warnings_module seems more prone to failure at higher parallelism (I've noticed this stochastically when working locally on a mac as well).

The lscpu output is below and shows 96 CPUs, so I'll try boosting up to that amount for the test suite, but the WIP tag will have to remain for as long as the test mentioned above is volatile.

Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                96
On-line CPU(s) list:   0-95
Thread(s) per core:    1
Core(s) per socket:    48
Socket(s):             2
NUMA node(s):          2
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             32K
L1i cache:             78K
L2 cache:              16384K
NUMA node0 CPU(s):     0-47
NUMA node1 CPU(s):     48-95

@charris
Copy link
Member

charris commented Sep 9, 2018

IIRC, suppress_warnings, and the warnings module in general, are not thread safe. @seberg Thoughts?

@seberg
Copy link
Member

seberg commented Sep 9, 2018

Yes indeed, warnings are generally not thread safe, that is also true for catch_warnings (unless it changed very recently) and thus pytest, etc.
I guess it should be in principle possible to make suppress_warnings thread safe using thread local storage (since it handles most of the stuff itself), but frankly I am not too familiar with thread local storage, etc. so I am not sure. Right now, there is no thread safety for sure.

@seberg
Copy link
Member

seberg commented Sep 9, 2018

Wouldn't be possible to parallelize in processes rather then threads and only parallelize for whole files or so? (Might be thinking completely wrong here, didn't try to understand it in depth)

@seberg
Copy link
Member

seberg commented Sep 9, 2018

@tylerjereddy, hmm, that is the only/main error prone test? Maybe the module filter is particularly bad or so, not sure. It is plausible we do not use it at all. Filtering by module is particularly hacky (the only really bad part IMO), so maybe there is something particularly wrong there, though not sure why or what.

How does the threading run the tests anyway?

@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Sep 9, 2018

Boosting up to the full 96 cores also causes 1 more failure (TestRandomDist.test_weibull_0), but actually hurts performance vs. 40 cores. @seberg Not sure about the parallel implementation details, but I thought it was all process rather than thread focused.

Perhaps the conclusion is that for now we don't have much to gain from using more cores on these ARM nodes, and plenty more susceptibility to testing issues. I could see investing time in those testing issues if we were getting a huge speed boost, but a little less clear that's justified here.

@seberg
Copy link
Member

seberg commented Sep 9, 2018

Well, it might be the test is just random for some other reason, it is a pretty funny test creating a fake module and so on I think...

It doesn't sound quite like it is related to just threads manipulating the warning filters anyway, and since you think it is process orientated, I am sure it is, it just sounds so much more sensible ;). So frankly, if this is the only test stopping you, we can look into fixing the test up. Only thing, I am not quite sure where to start debugging (and don't have much time), so maybe it is not wroth the trouble :).

@tylerjereddy
Copy link
Contributor Author

One other question might be -- do we expect building NumPy to scale well with increased parallelism?

I always thought it helped informally when I work on x86_64 arch, but we don't seem to squeeze much of anything out even going from 1-40 cores on ARM here. Being able to independently compile modules instead of following a fairly strict serial order may be at odds with the way the project organization / dependencies evolved over time.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2018

Yes indeed, warnings are generally not thread safe, that is also true for catch_warnings (unless it changed very recently) and thus pytest, etc.

Is that true? I recently tested parallel test runs for scipy with pytest-xdist quite extensively, and never ran into a single failure despite the many warning suppressions.

I haven't checked how pytest-dist is implemented under the hood.

@charris
Copy link
Member

charris commented Sep 9, 2018

do we expect building NumPy to scale well with increased parallelism

One thing I just noticed yesterday, is that the config.h file is generated in numpy/core/setup.py, but is needed outside of that package -- numpy/fft for instance. So there is some potential for problems with parallel builds. The base numpy/setup.py runs that build early in when single threaded. That said, I am not clear on how the parallel builds actually work.

@seberg
Copy link
Member

seberg commented Sep 9, 2018

@rgommers the warnings filters all work based on mutating the warnings._filters list, so I don't see how it can possibly be thread safe. My guess is, the chance of harmful collisions is maybe just very low? Dunno pytest though, maybe they added some serious machinery to their warnings filtering.

EDIT: More importantly, catch warning and suppress warnings replaces warnings.showwarning.

@rgommers
Copy link
Member

One thing I just noticed yesterday, is that the config.h file is generated in numpy/core/setup.py, but is needed outside of that package -- numpy/fft for instance. So there is some potential for problems with parallel builds. The base numpy/setup.py runs that build early in when single threaded. That said, I am not clear on how the parallel builds actually work.

Should be fine. From the PR description that introduced parallel builds (gh-5161): The parallelization is limited to within the files of an extension, so only numpy multiarraymodule really profits...

@tylerjereddy
Copy link
Contributor Author

So, just TestRandomDist.test_weibull_0 is failing when 96 processes are used for testing after rebasing to include the warnings parallel fix.

We only get about a minute faster if we resolve that anyway, so maybe not a big priority given the much larger queue lengths in comparison.

@charris
Copy link
Member

charris commented Oct 4, 2018

@tylerjereddy Do you want to keep this open?

@tylerjereddy
Copy link
Contributor Author

I guess I'll close it -- Stefan seems to be redirecting my energy away from CI stuff anyway

@tylerjereddy tylerjereddy deleted the shippable_speedup branch October 4, 2018 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants