-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
gh-137242: Add Android CI job #137186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-137242: Add Android CI job #137186
Conversation
The logs are showing a number of "Scudo OOM" warnings similar to those discussed in #121595. However, none of them have been fatal yet. In API level 35 the memory allocator no longer prints such an alarming message when this happens. Instead it will just be a one line message like this (llvm/llvm-project#68444):
Luckily, in API level 35 they've also added the ability to print the allocator statistics at any time, with code like this (llvm/llvm-project@5759e3c): import ctypes
ctypes.CDLL(None).mallopt(-205, 0) Based on those statistics, it doesn't look like Python has a memory leak, as the "inuse" column after the test is roughly the same as it was before. So it's probably just temporary high memory use during the test, causing allocations to be directed to the next highest size class. On the buildbot about a year ago, these warnings were often associated with crashes. As described in Android/README.md, we worked around this by running the emulator once to create it, then editing its configuration files to increase the emulator RAM from 2 GB to 4 GB. After that, the warnings continued, but the crashes stopped. But that isn't such a good option on GitHub Actions, because the runner is not persistent, so the extra emulator restart would add about a minute to every CI run. In theory you should be able to avoid this by creating the emulator from a custom hardware profile, but in my experiments, the emulator always ended up with 2 GB of RAM regardless of what it said in the profile. The tests that trigger this are:
These tests all involve parsing an extremely long Python statement. Although they're all skipped on WASM because of limited stack space, reducing the recursion limit does not prevent the warning. What does prevent it is reducing the size of the Python statement, e.g. from 500,000 repetitions to 200,000. Since I haven't seen these tests crash on GitHub Actions yet, I'm not going to change them just now. If they do start crashing, we can change them to reduce the statement size on Android alone, or even on all platforms if it doesn't significantly reduce the strength of the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One fairly minor comment inline; my other comment is an extension of that one.
Although the action is broken up into steps, the context of those steps is lost when it's embedded in the workflow. Aside from the general ergonomics of being able to follow what stage the workflow is up to, there's very little visibility on the fact that some steps aren't be in invoked - unless you pay close attention, it's not clear that the arm64 run isn't running the tests. I know it's not, and why - but there's no external visibility of that fact (like a clearly skipped build step).
If the constraint against using a workflow is that we would need to reference a specific commit - isn't that commit hash one of the inputs to the release workflow? Does uses
not accept variable expansions in workflow references?
# This is coded as an action rather than a workflow so the release-tools | ||
# repository can load it from a dynamically-chosen commit. Cross-repository | ||
# workflow calls must have a Git reference hard-coded in the calling workflow, | ||
# but actions can be run dynamically from the runner's filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense; but it also raises an eyebrow because this is something that no other platform has needed. I presume this is because no other platform that is generating binary artefacts is doing so with the tooling in release-tools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webknjaz, do you have any thougts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look, thanks for tagging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhsmith so I've tried to understand the context and this justtification didn't make sense to me.
It is perfectly possible to call reusable workflows from other repositories (in fact, this is what I'm building my reusable-tox.yml
ecosystem on).
My understanding is that this is meant to be used in https://github.com/python/release-tools/blob/698deaf2ebff433a6ab9d4b5ded97a40fce109a1/.github/workflows/source-and-docs-release.yml, right?
In any case, I've been moving away from composite actions in favor of reusable workflows. This is because composite actions (or any actions for that matter) are represented as a single step in job runs. And it's rather difficult to inspect what actions are doing. So from the troubleshooting perspective, I'd strongly advise against composite actions.
It is important to make every step visible and transparent. And if you follow the practice I established with reusable workflows as modules in here, this is definitely possible.
I started with in-repo "modules" two years ago because I was solving the problem of making it possible to collect all the job statuses in a single check (for branch protection). This wasn't because it's somehow impossible to host them externally. This was just not something necessary for that purpose.
@encukou I've actually been meaning to ask if there's any workflows that are being duplicated in the python org. If yes, it'd make sense to host them in a shared repository. This could be a .github
repo or even that release-tools
one (although, I don't know if it makes semantic sense). This is a separate question, though.
That said, if you've faced any confusion or need help adapting this to be a reusable workflow, just let me know where you're stuck. I can help you navigate this or just rewrite it for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense; but it also raises an eyebrow because this is something that no other platform has needed. I presume this is because no other platform that is generating binary artefacts is doing so with the tooling in
release-tools
?
Right now, release-tools
only creates source zips and docs artifacts. See https://github.com/python/release-tools/actions/runs/16450411678 for 3.14 RC2.
The Windows artifacts are built in Azure Pipelines, here's RC2. And Ned builds the macOS artifacts. (release-tools
later takes these Windows and macOS artifacts and signs and uploads them.)
We're hoping to build the macOS artifacts using CI in the near future, so what we decide here may help inform how to do that to :)
My understanding is that this is meant to be used in python/release-tools@
698deaf
/.github/workflows/source-and-docs-release.yml, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd like to see is CI in this repo to check Android builds okay, so we don't get caught ought on release day because we only build in release-tools
.
This happened in 3.14 RC1 with the plaintext docs, which had broken back in April or so, but RC1 is the first prerelease to build docs and we hadn't caught it here.
One deciding factor for whether we have stuff over here (via composite actions or something else) might be how much difference there'll be between different versions (3.14, 3.15, etc). If a lot, we might not want it all in release-tools
and would benefit from versioning things in branches over here.
On the other hand, the docs build is also versioned in branches here, release-tools
CI calls a make dist
command in this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, the docs build is also versioned in branches here,
release-tools
CI calls amake dist
command in this repo.
That's more or less the way I've been trying to suggest.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@freakboy3742 it does not. But I don't think that it's neccessary or that composite actions would be any different in this regard. |
You may not be able to use variable expansions directly in an action reference, but you can use a variable in a
I'm not sure about that. The workflows to build and test Python are closely coupled to the scripts, makefiles, etc, which they call, all of which are in the
I agree this wasn't ideal. I've just pushed an alternative approach which uses |
Sure - I understand how this works when used as an action; I was driving more towards the "can we use this as a workflow" approach. But if a variable isn't valid as an actions reference, then it's a moot point.
I agree with @mhsmith here - I don't think decoupling the build scripts from the code repo would be a net win. It's already complicated enough having to update 2 repositories (the development guide and the code repo) whenever the build script is updated; adding third repository that requires an update isn't going to make things any clearer.
This is definitely better - still not as good as "true" actions steps, but it's a lot more legible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of ::group::
is a significant improvement. A true workflow would still be my preferred outcome - but I also don't have any better options for how to reference/include it.
I've flagged a couple of other minor issues.
run: | | ||
# Build and test (Android) | ||
echo "::group::Configure build Python" | ||
./Android/android.py configure-build | ||
echo "::endgroup::" | ||
|
||
echo "::group::Compile build Python" | ||
./Android/android.py make-build | ||
echo "::endgroup::" | ||
|
||
echo "::group::Configure host Python" | ||
./Android/android.py configure-host $triplet | ||
echo "::endgroup::" | ||
|
||
echo "::group::Compile host Python" | ||
./Android/android.py make-host $triplet | ||
echo "::endgroup::" | ||
|
||
echo "::group::Make release package" | ||
./Android/android.py package $triplet | ||
echo "::endgroup::" | ||
|
||
if [ "$RUNNER_OS" = "Linux" ] && [ "$RUNNER_ARCH" = "X64" ]; then | ||
# https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/ | ||
echo "::group::Enable KVM for Android emulator" | ||
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' \ | ||
| sudo tee /etc/udev/rules.d/99-kvm4all.rules | ||
sudo udevadm control --reload-rules | ||
sudo udevadm trigger --name-match=kvm | ||
echo "::endgroup::" | ||
|
||
echo "::group::Unpack release artifact" | ||
mkdir $RUNNER_TEMP/android | ||
tar -C $RUNNER_TEMP/android -xf cross-build/$triplet/dist/* | ||
echo "::endgroup::" | ||
|
||
echo "::group::Tests" | ||
# Arguments are similar to --fast-ci, but in single-process mode. | ||
$RUNNER_TEMP/android/android.py test --managed maxVersion -v -- \ | ||
--single-process --fail-env-changed --rerun --slowest --verbose3 \ | ||
-u "all,-cpu" --timeout=600 | ||
echo "::endgroup::" | ||
|
||
else | ||
echo "Skipping test: GitHub Actions currently only supports the" \ | ||
"Android emulator on Linux x86_64." | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the action vs reusable workflow discussion, it's best to move this entire script into a dedicated file (.sh
or even .py
). Calling it could then be wrapped as a GNU/make target, unifying things around Makefile
as a central place for a number of processes. It is one already, after all, so why deviate from that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... we have that script already - it's the Android/android.py
that is being invoked here.
There's multiple invocations of the script because there are multiple "substeps" inside that script - and this script is calling each of those substeps individually so that the log has clear visibility on which step fails (if and when it fails). It's the need to visualise those individual workflow steps (vs the ::group::
solution currently in the PR) that has been the bulk of discussion on this PR so far.
There's also one step - the KVM configuration - that is Github Actions specific.
I guess one option would be to encode this sequence as a make github-build-test-android
(or similar), that executes all these steps, including GitHub-Actions specific logging group statements and KVM controls... but it feels a little weird to encode Github Actions specific calls into the Makefile.
That said - given that we can't easily have a reusable workflow here, and a Makefile target with ::group::
output would effectively give us the same end-user experience, and would also remove the need for an action (as the release workflow would also be make GitHub-build-test-android
)... maybe that is the best approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, kind of. This is something that I'm exploring in my reusable-tox.yml
thing. There, I encourage to move the customizations into the tox land. Though, my workflow also has a mechanism for GHA-specific hooks that do indeed use in-tree actions for customizing things.
Still, I've grown to like moving more things into the unified workflow tool that a give project has (tox/nox/make/whatever). This makes it possible to reuse things outside a single platform.
It is possible to detect running under GHA (https://github.com/ansible/awx-plugins/blob/c4180cd/toxfile.py#L16). And this allows having grouping set up in the Python script that would also work locally (https://github.com/deadsnakes/action/blob/5344dcb/bin/install-python#L22-L33).
So I wouldn't try to put echo "::group::..."
into Makefile
directly. Instead, it's possible to have make build-n-test-cycle
calling ./Android/android.py build-n-test-cycle "${triplet}"
, which in turn would implement nice logging (and conditional KVM setup) when under GHA, for example.
That would be an entry point to be called under GHA but also could be excersiced locally or called under some other CI if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course ./Android/android.py build-n-test-cycle "${triplet}"
could be implemented as a shell script but this might be an unnecessary layer of indirection in this case. Plus coding in Python is more pleasant, predictable, lintable, well-understood.
Thanks for your ideas, everyone. I've now added an
I have made the requested changes; please review again. I've also made the corresponding changes to python/release-tools#265. |
Thanks for making the requested changes! @freakboy3742: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me - the CI configuration is as about as simple as it could get; the output in CI is very navigable; and end users testing under Github will get the added benefit of not needing explicit CI configuration.
I'm happy to approve as is; holding off merging until @webknjaz has had a chance to take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still like this job to be wrapped as a reusable workflow:
- for in-repo use — I'd like to eventually get to a point where
build.yml
would only declare releationships between the jobs and matrices, keeping any steps inreusable-*.yml
workflows; this is to unify the structure better + have cleaner separation of concerns/abstraction layers - this would allow having a visually collapsible group of jobs in the UI, consistent with how other groups look
- with 3-4 inputs in it, it would allow relying on the same steps in
python/release-tools
(if an optionalactions/upload-artifact
is added) - the benefits of having the step wrapped in
android.py
would remain
However, I think it's okay to merge the PR as is and I could send a follow-up with such a change. I only ask the matrix definition to be fixed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in and I'll look into wrapping it as a reusable workflow later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
See the linked issue for details: