Skip to content

TST: add macos azure testing to CI #12051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 29, 2018
Merged

Conversation

tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Sep 28, 2018

Related to #12036.

I figured I'd start this with Mac OS for Azure because that platform is absent from our current github-integrated CI, which perhaps reduces the pressure to replicate a current matrix entry & allows restoration of a common platform to the test matrix with apparently short queues at the moment.

I tried to add detailed notes to the yml file. I think the discussion surrounding test & support for accelerate on mac is tricky, perhaps in no small part to the opacity of the resolution of bugs there and so on.

If we want to expand the matrix or try an x86 arch target or whatever else I suppose that may be something to discuss. My impression is that we may want to focus the 10 free parallel jobs on Azure mostly to relieve current Windows bottlenecks.

All tests are passing on my fork with this PR branch. You can see some helpful stats on the test data page, including the drop in test time from the previous run when a single core was used for pytest.

As usual, getting to "all green" on the tests with a new platform was a bit painful -- my compromise was to add 1 more skipif decorator on a single f2py test. f2py is always a "treat" to work with, but homebrew gfortran mostly seems to play ok with it.

If core devs / steering committe would like, I can look into trying to activate the github hook for azure so we can see it on this PR for NumPy repo proper -- otherwise, one of the "owners" may have to do that.

cc Microsoft folks who commented elsewhere: @ericsciple @chrisrpatterson

from . import util
from numpy.testing import assert_equal

import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally try to keep this import up in front of the numpy/local imports.

@pytest.mark.skipif(platform.system() == 'Darwin',
reason="Prone to error when run with "
"numpy/f2py/tests on mac os, "
"but not when run in isolation")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that running python runtests.py -t "numpy/f2py/tests/test_semicolon_split.py" does not cause issues, but the decorated test will error with python runtests.py -t "numpy/f2py/tests", or the entire test suite. So, context-dependent behavior in the test container.

@charris
Copy link
Member

charris commented Sep 29, 2018

Can you try hitting the settings button and then webhooks?

@tylerjereddy
Copy link
Contributor Author

Don't have access to settings. I created a numpy project on azure & triggered a manual build of master: https://dev.azure.com/numpy/numpy/_build/results?buildId=1&view=logs

Of course, master has no .yml so that doesn't do much. I always find the initial seed event for CI set up confusing. For my fork I temporarily switched the default branch, pushed a .yml into that test branch, and then opened a PR against the branch with a yml in it.

I think it may be possible to manually trigger the PR if the feature branch has been pushed to NumPy proper, but I think we try to keep our branch space clean so not doing that without consent.

Other option is the github app, which I think an owner can do--not sure if that'll help trigger builds off PRs when master doesn't have the yml, maybe. I should figure out how to add core devs to the config page, etc.

@charris
Copy link
Member

charris commented Sep 29, 2018

Don't worry about the webhook, I can add it if you post it here, I just wondered if you had sufficient permissions for that. The thing about the test succeeding when run individually, but not as part of the package, is disturbing and is something we should understand, it indicates something going wrong. I don't understand the Fortran code there.

@charris
Copy link
Member

charris commented Sep 29, 2018

I'm going to merge this just so we can all experiment with it.

@charris charris merged commit a000144 into numpy:master Sep 29, 2018
@charris
Copy link
Member

charris commented Sep 29, 2018

Thanks Tyler.

@charris
Copy link
Member

charris commented Sep 29, 2018

Could you add me to this project? My user name is "charlesrharris"

@tylerjereddy
Copy link
Contributor Author

The user name didn't seem to work -- I tried to invite using email "charlesr.harris@gmail.com"

It shows you as a member at: https://dev.azure.com/numpy/numpy

Not sure if that works -- I think you'll have to be the one initiate the webhook on master.

@charris
Copy link
Member

charris commented Sep 29, 2018

Interesting, but I was just able to trigger a build, so I must have some permissions, but the tests did not run, maybe just need to find the right button.

It shows you as a member at

Was I before? We need someone to sign up to see how that works. Maybe @eric-wieser .

Not sure if that works

All I need is the webhook address, which should be floating around somewhere. I can paste it in on github. At least that worked for appveyor. Should probably be located in the project somewhere, I can also look there.

@charris
Copy link
Member

charris commented Sep 29, 2018

The Microsoft documentation still rates as awful. I note that with both your and mine manual builds the repo is checked out, but no tests are run.

@charris
Copy link
Member

charris commented Sep 29, 2018

Looks like webhooks need to be created, and I only see them for some slangy microsoft thingie, a "runbook".

@tylerjereddy
Copy link
Contributor Author

Yeah, the docs are pretty bad

@charris
Copy link
Member

charris commented Sep 29, 2018

Maybe you need to add a trigger: section to the yml file, as the actual build and tests are not running.

@tylerjereddy
Copy link
Contributor Author

This got me to thinking that it seems unlikely that my fork and NumPy proper would require a different yml, given that both pushes to and PRs to my fork work just fine with the yml merged to NumPy proper. Indeed, by having the owner of my fork (me) set up Azure, I guess the permissions were correct for injecting the webhooks automatically to the github repo.

Here are the URLs used for my azure webhooks in my settings:
https://dev.azure.com/tylerjereddy/_apis/public/hooks/externalEvents (push)
https://dev.azure.com/tylerjereddy/_apis/public/hooks/externalEvents (pull_request)

One had just the push event selected, and the other has pull requests checked off.

So, by analogy, perhaps you can try:
https://dev.azure.com/numpy/_apis/public/hooks/externalEvents (push)
https://dev.azure.com/numpy/_apis/public/hooks/externalEvents (pull_request)

I see that my repo also has some authentication (secret field?) set up for Azure -- not sure if that will work automagically if you fill in those URLs and select push / pull request, etc.

@charris
Copy link
Member

charris commented Sep 30, 2018

Ping returns 404 error

@chrispat
Copy link

chrispat commented Oct 1, 2018

You won't be able to just paste in the web hook URLs as there is a shared secret generated by the setup process. The preferred workflow is to enable install the Github App but you can also do Oauth or PAT if necessary.

@tylerjereddy
Copy link
Contributor Author

@charris Do you want to try the Github app approach or just have an owner create a pipelines account? Someone with the appropriate permissions would need to enable the Github App if we go that route. Seems a little tricky!

@charris
Copy link
Member

charris commented Oct 1, 2018

I'd be happy to try the App route, I think travisCI works that way. We can try the owner approach if all else fails -- appveyor runs on my account -- but it would be preferable to have an organization account if that is possible.

@charris
Copy link
Member

charris commented Oct 1, 2018

@chrisrpatterson So the next question is how to install the Github app?

@charris
Copy link
Member

charris commented Oct 1, 2018

So I'm in the process of installing the app, which microsoft account to use?

@charris
Copy link
Member

charris commented Oct 1, 2018

I don't have the permissions on azure to attach the numpy account. How did you set up numpy/numpy?

@charris
Copy link
Member

charris commented Oct 1, 2018

I'm thinking I'll need to attach my own MS account. Note this was all much easier with travis as it is integrated with github.

@charris
Copy link
Member

charris commented Oct 1, 2018

So now I am logged in on github as numpy, let's try that again.

@tylerjereddy
Copy link
Contributor Author

Yeah, this is messy, let me know what I can do--I just clicked "new organization" to create numpy on dev.azure.com & then "create project" so that we have numpy/numpy

@chrispat
Copy link

chrispat commented Oct 1, 2018

@charris yes integrating with github identities is ongoing work and that will make this much easier.

I would think you guys would want to have an Azure DevOps org for numpy and then you can create projects under that as necessary. @tylerjereddy you would need to add @charris MSA as a project administrator in numpy/numpy in order for him to install the app and configure the pipeline.

Here are some docs that might help https://docs.microsoft.com/en-us/azure/devops/organizations/security/set-project-collection-level-permissions?view=vsts&tabs=new-nav

@charris
Copy link
Member

charris commented Oct 1, 2018

So adding the app is easy, but to change azure accounts, uninstall/reinstall.

@charris
Copy link
Member

charris commented Oct 1, 2018

Travis does the merge commits also, sends me email on failure. There seem to be a significant number of bogus failures these days...

@charris
Copy link
Member

charris commented Oct 1, 2018

@rgommers @pv One of you might want to set up an azure scipy account to make sure you have the name.

@tylerjereddy
Copy link
Contributor Author

Windows Azure build log link with a small number of failures / errors reported in case people are curious how that is going. Starting with Python 3.6 there.

@charris
Copy link
Member

charris commented Oct 1, 2018

I think it is skipping the fortran tests in any case, might not be worth loading the fortran compiler. Looks like more tempfile problems. Might try NamedTemporaryFile for some of those.

@tylerjereddy
Copy link
Contributor Author

I don't think I load the fcompiler in Windows build? Some apparent precision issues maybe.

@charris
Copy link
Member

charris commented Oct 1, 2018

You're right, apparently I was looking at the Mac process.

@charris
Copy link
Member

charris commented Oct 2, 2018

Hmm, I don't think the f2py tests are thread safe in general. Does the mac testing run multiple threads?

@charris
Copy link
Member

charris commented Oct 2, 2018

lets leave out xdist for the moment. Probably others as well, I don't think we can rely on thread safety, and some of the tests were written many years ago.

@tylerjereddy
Copy link
Contributor Author

Yeah, I set it to auto but we can just turn it off; shippable is using 2 processes to run tests -- we could revert that too, although seems to be fairly stable lately after the warnings patch.

@charris
Copy link
Member

charris commented Oct 2, 2018

How do you set the number of threads? I note that following are installed for appveyor.

cython
nose
pytest-timeout
pytest-xdist
pytest-env
pytest-faulthandler

But I don't know that it runs multiple threads.

@tylerjereddy
Copy link
Contributor Author

Our Appveyor config uses a single process for tests; add more processes by specifying -n 2 for two processes and so on, and -n auto for automatic detection (isn't always perfect). That flag comes after the -- fed to runtests.py so that the args are fed directly to pytest.

Still, if it was well-known that our test suite wasn't thread safe, it wouldn't make sense to install that dependency.

@charris
Copy link
Member

charris commented Oct 2, 2018

I haven't studied the problem, but there is this in numpy/f2py/tests/util.py

def get_temp_module_name():
    # Assume single-threaded, and the module dir usable only by this thread

And a global directory stored in the module. I'm guessing some other tests might have problems. It is tricky to debug intermittent failures due to threading, so it would probably be good to start with the simple bugs before escalating.

EDIT: That function looks like a good place for a threading context manager

@tylerjereddy
Copy link
Contributor Author

Yeah, I noticed that global in the distutils work! I suspect we agree there's likely a better way

@charris
Copy link
Member

charris commented Oct 2, 2018

I think the windows test that is failing is a bad test. Maybe use os.path.samefile? The test (and others) is also not safe against failure, which I think is the cause of the teardown error.

@tylerjereddy
Copy link
Contributor Author

Yeah, I have a local branch with a fix I'm testing -- os.path.realpath does not resolve symlinks on windows so that test was never safe and Azure is correct to fail it.

@tylerjereddy
Copy link
Contributor Author

The remaining 3 Windows Azure failures are all in TestComplexFunctions. Since not everything fails there, my current suspicion is either that our fall back code for when C99 complex.h isn't supported has some issues, or the Microsoft-specific implementation of complex support is behaving slightly differently.

We clearly try to handle this MSVC-specific complex number stuff in numpy/core/src/npymath/npy_math_private.h. Since loss of precision is one of the failures, I do wonder if it is just an alternative implementation type issue.

More recent versions of MSVC apparently do support the majority of C99, but our appveyor builds still try to use MinGW, though perhaps not for that specific reason (there's also gfortran reasons I suppose).

Maybe we're just going to have to add MinGW to the path and / or install if needed?

@tylerjereddy
Copy link
Contributor Author

I note that the Microsoft repo shows that for the Widows image, MinGW bin/ should be on the PATH.

@charris
Copy link
Member

charris commented Oct 2, 2018

My understanding was that mingw was not actually used for the appveyor build, although I couldn't tell you exactly why it is there. I think xoviat (disappeared by Github) added it, although that may not have an accurate recollection. What are the failures?

@charris
Copy link
Member

charris commented Oct 2, 2018

Note that once we drop 2.7, we will be using VS 2015+, so the StackOverflow comment may be out of date. OTOH, having some problems with the cuts for complex functions would be normal, unfortunately.

@tylerjereddy
Copy link
Contributor Author

We certainly seem to have access to gfortran for the 3.x Windows builds on appveyor, though not sure that is intentional.

The 3 fails from my notes:

  • TestComplexFunctions.test_branch_cuts
    • numpy\core\tests\test_umath.py:2716: AssertionError
    • f = <ufunc 'arcsin'>,
  • TestComplexFunctions.test_branch_cuts_complex64
    • numpy\core\tests\test_umath.py:2716: AssertionError
    • f = <ufunc 'arcsin'>,
  • TestComplexFunctions.test_loss_of_precision
    • numpy\core\tests\test_umath.py:2602: AssertionError

I'm currently trying a debug run where I force CC=gcc, which should hopefully redirect to the mingw toolchain instead of the msvc stuff, for diagnostic purposes.

@tylerjereddy
Copy link
Contributor Author

Forcing CC: gcc in the env directive didn't seem to help avoid usage of MSVC for NumPy compilation; that debugging route may be tricky.

@mattip
Copy link
Member

mattip commented Oct 3, 2018

CPython has extensive test cases for complex math, which are used in tests for cpython complex. They pass on python2 with MSVC 2008. It seems the branch cut test is different, rather than checking returned values against a table it checks continuity. I wonder where the difference lies?

@tylerjereddy
Copy link
Contributor Author

Looks like @mattip will try debugging the residual Windows issues. I can add him to our NumPy Azure account once he has a suitable Microsoft account, I think. Matti can use the numpy_azure_win branch on my NumPy fork as a starting point, if desired. Setting up his own fork for Azure triggers may reduce noise a bit.

There was some discussion that we might want to proceed with temporarily xfailing the offending Windows tests so I can proceed with building up the matrix in the yml file, but I think they'd get xfailed on Appveyor too then, so that may not be desirable.

@stefanv suggested we may want to link against openblas for the Azure MacOS build since that's apparently what is done for wheels.

@charris
Copy link
Member

charris commented Oct 4, 2018

Linking against OpenBLAS is certainly something to look at, but I think the first thing to do is get up a complete matrix and track down the remaining problems. If we need to blacklist some functions, knowing the MSVC compiler version will be necessary, so that should be printed out somewhere. Do you know what library is being used?

@mattip
Copy link
Member

mattip commented Oct 4, 2018

knowing the MSVC compiler version will be necessary, so that should be printed out somewhere

@tylerjereddy added the --show-build-log option to runtests.py on his branch

Since this PR is closed, maybe we should continue the discussion on new issue/pr

@charris charris mentioned this pull request Oct 4, 2018
@charris
Copy link
Member

charris commented Oct 4, 2018

Since this PR is closed, maybe we should continue the discussion on new issue/pr

See #12078.

@tylerjereddy
Copy link
Contributor Author

Would also be useful if Azure & Shippable obeyed [ci skip] in commit message--Shippable is documented to (but maybe we have extra triggers overriding or something), and Azure seems to require ***NO_CI***

@tylerjereddy tylerjereddy deleted the numpy_azure branch January 30, 2019 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants