WIP, TST: bump up CPU usage on shippable nodes #11915

tylerjereddy · 2018-09-09T04:52:41Z

The shippable team noted that although the pool of nodes is shared, once allocated for a job the node is exclusively assigned so we can use all the cores--trying to bump up usage here--will remove WIP tag if it helps.

I ran "lscpu" a week or two ago in one of the jobs and the number was quite large -- can't remember how much of that is physical vs. logical, but worth trying to bump up. I think previous small number testing suggested that it can help for testing more than building, but let's see.

tylerjereddy · 2018-09-09T05:04:36Z

Doesn't seem to help with auto selection of cores for pytest selecting 2 -- maybe will try one more thing tomorrow; building is no faster.

tylerjereddy · 2018-09-09T17:29:16Z

The overall ARM CI runs are getting faster here, but the test_suppress_warnings_module seems more prone to failure at higher parallelism (I've noticed this stochastically when working locally on a mac as well).

The lscpu output is below and shows 96 CPUs, so I'll try boosting up to that amount for the test suite, but the WIP tag will have to remain for as long as the test mentioned above is volatile.

Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                96
On-line CPU(s) list:   0-95
Thread(s) per core:    1
Core(s) per socket:    48
Socket(s):             2
NUMA node(s):          2
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             32K
L1i cache:             78K
L2 cache:              16384K
NUMA node0 CPU(s):     0-47
NUMA node1 CPU(s):     48-95

charris · 2018-09-09T17:35:14Z

IIRC, suppress_warnings, and the warnings module in general, are not thread safe. @seberg Thoughts?

seberg · 2018-09-09T17:52:05Z

Yes indeed, warnings are generally not thread safe, that is also true for catch_warnings (unless it changed very recently) and thus pytest, etc.
I guess it should be in principle possible to make suppress_warnings thread safe using thread local storage (since it handles most of the stuff itself), but frankly I am not too familiar with thread local storage, etc. so I am not sure. Right now, there is no thread safety for sure.

seberg · 2018-09-09T17:54:29Z

Wouldn't be possible to parallelize in processes rather then threads and only parallelize for whole files or so? (Might be thinking completely wrong here, didn't try to understand it in depth)

seberg · 2018-09-09T18:02:26Z

@tylerjereddy, hmm, that is the only/main error prone test? Maybe the module filter is particularly bad or so, not sure. It is plausible we do not use it at all. Filtering by module is particularly hacky (the only really bad part IMO), so maybe there is something particularly wrong there, though not sure why or what.

How does the threading run the tests anyway?

tylerjereddy · 2018-09-09T18:17:16Z

Boosting up to the full 96 cores also causes 1 more failure (TestRandomDist.test_weibull_0), but actually hurts performance vs. 40 cores. @seberg Not sure about the parallel implementation details, but I thought it was all process rather than thread focused.

Perhaps the conclusion is that for now we don't have much to gain from using more cores on these ARM nodes, and plenty more susceptibility to testing issues. I could see investing time in those testing issues if we were getting a huge speed boost, but a little less clear that's justified here.

seberg · 2018-09-09T18:29:59Z

Well, it might be the test is just random for some other reason, it is a pretty funny test creating a fake module and so on I think...

It doesn't sound quite like it is related to just threads manipulating the warning filters anyway, and since you think it is process orientated, I am sure it is, it just sounds so much more sensible ;). So frankly, if this is the only test stopping you, we can look into fixing the test up. Only thing, I am not quite sure where to start debugging (and don't have much time), so maybe it is not wroth the trouble :).

tylerjereddy · 2018-09-09T18:55:13Z

One other question might be -- do we expect building NumPy to scale well with increased parallelism?

I always thought it helped informally when I work on x86_64 arch, but we don't seem to squeeze much of anything out even going from 1-40 cores on ARM here. Being able to independently compile modules instead of following a fairly strict serial order may be at odds with the way the project organization / dependencies evolved over time.

rgommers · 2018-09-09T19:20:46Z

Yes indeed, warnings are generally not thread safe, that is also true for catch_warnings (unless it changed very recently) and thus pytest, etc.

Is that true? I recently tested parallel test runs for scipy with pytest-xdist quite extensively, and never ran into a single failure despite the many warning suppressions.

I haven't checked how pytest-dist is implemented under the hood.

charris · 2018-09-09T19:42:21Z

do we expect building NumPy to scale well with increased parallelism

One thing I just noticed yesterday, is that the config.h file is generated in numpy/core/setup.py, but is needed outside of that package -- numpy/fft for instance. So there is some potential for problems with parallel builds. The base numpy/setup.py runs that build early in when single threaded. That said, I am not clear on how the parallel builds actually work.

seberg · 2018-09-09T20:05:26Z

@rgommers the warnings filters all work based on mutating the warnings._filters list, so I don't see how it can possibly be thread safe. My guess is, the chance of harmful collisions is maybe just very low? Dunno pytest though, maybe they added some serious machinery to their warnings filtering.

EDIT: More importantly, catch warning and suppress warnings replaces warnings.showwarning.

rgommers · 2018-09-10T05:54:52Z

One thing I just noticed yesterday, is that the config.h file is generated in numpy/core/setup.py, but is needed outside of that package -- numpy/fft for instance. So there is some potential for problems with parallel builds. The base numpy/setup.py runs that build early in when single threaded. That said, I am not clear on how the parallel builds actually work.

Should be fine. From the PR description that introduced parallel builds (gh-5161): The parallelization is limited to within the files of an extension, so only numpy multiarraymodule really profits...

tylerjereddy · 2018-09-25T22:17:07Z

So, just TestRandomDist.test_weibull_0 is failing when 96 processes are used for testing after rebasing to include the warnings parallel fix.

We only get about a minute faster if we resolve that anyway, so maybe not a big priority given the much larger queue lengths in comparison.

charris · 2018-10-04T19:53:49Z

@tylerjereddy Do you want to keep this open?

tylerjereddy · 2018-10-04T20:11:11Z

I guess I'll close it -- Stefan seems to be redirecting my energy away from CI stuff anyway

tylerjereddy added the 05 - Testing label Sep 9, 2018

tylerjereddy mentioned this pull request Sep 22, 2018

TST: improve warnings parallel test safety #12015

Merged

TST: use more cores for shippable testing [skip appveyor].

2e4e10b

tylerjereddy force-pushed the shippable_speedup branch from 3dfa3d2 to 2e4e10b Compare September 25, 2018 18:46

tylerjereddy closed this Oct 4, 2018

tylerjereddy deleted the shippable_speedup branch October 4, 2018 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP, TST: bump up CPU usage on shippable nodes #11915

WIP, TST: bump up CPU usage on shippable nodes #11915

tylerjereddy commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

charris commented Sep 9, 2018

seberg commented Sep 9, 2018

seberg commented Sep 9, 2018

seberg commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018 •

edited

Loading

seberg commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

rgommers commented Sep 9, 2018

charris commented Sep 9, 2018

seberg commented Sep 9, 2018 •

edited

Loading

rgommers commented Sep 10, 2018

tylerjereddy commented Sep 25, 2018

charris commented Oct 4, 2018

tylerjereddy commented Oct 4, 2018

WIP, TST: bump up CPU usage on shippable nodes #11915

WIP, TST: bump up CPU usage on shippable nodes #11915

Conversation

tylerjereddy commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

charris commented Sep 9, 2018

seberg commented Sep 9, 2018

seberg commented Sep 9, 2018

seberg commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018 • edited Loading

seberg commented Sep 9, 2018

tylerjereddy commented Sep 9, 2018

rgommers commented Sep 9, 2018

charris commented Sep 9, 2018

seberg commented Sep 9, 2018 • edited Loading

rgommers commented Sep 10, 2018

tylerjereddy commented Sep 25, 2018

charris commented Oct 4, 2018

tylerjereddy commented Oct 4, 2018

tylerjereddy commented Sep 9, 2018 •

edited

Loading

seberg commented Sep 9, 2018 •

edited

Loading