CI Move Pyodide CI from Azure to GitHub Actions #29791

agriyakhetarpal · 2024-09-05T18:22:26Z

Reference Issues/PRs

Discussed in pypa/cibuildwheel#1878, and on top of #31058.

What does this implement/fix? Explain your changes.

This PR reinstates the creation of the Pyodide virtual environment through the pyodide venv <dir> command as a conventional means of running the scikit-learn test suite with pytest in favour of the Node.js-based wrapper script build_tools/azure/pytest-pyodide.js. Previously, unresolved symbol errors were coming from symbol visibility issues post OpenBLAS's linkage (pyodide/pyodide#3331) in the in-tree recipe for SciPy, which seem to have more or less subsided with our updates to it in recent months (see pyodide/pyodide#4719, pyodide/pyodide#5012, pyodide/pyodide#5031, and so on). These issues did not affect the Node.js-based runner, which explains the case for its usage.

As a simplification, cibuildwheel performs the necessary changes in emscripten.yml through the CIBW_PLATFORM: pyodide environment variable that was released as a part of cibuildwheel 2.19. Helper scripts that were used to install Pyodide and the Emscripten toolchain are no longer needed.

Also, this PR drops the [pyodide] commit marker as a by-product of switching from Azure Pipelines to GitHub Actions, which means that Pyodide/WASM as a platform is now tested on every commit.

Any other comments?

This was discussed in Pyodide builds: better support the use of Node.js-based script runners in CIBW_TEST_COMMAND pypa/cibuildwheel#1878 and @lesteve suggested upstreaming the change here.
Some of the symbol visibility issues in Pyodide were resolved in Fix dynamic library loading in cli runner pyodide/pyodide#4871
Please refer to https://github.com/lesteve/scipy-tests-pyodide for taking at a look at the SciPy test suite within Pyodide (and its status). This is slightly dated, however, and the SciPy tests now run as a part of Pyodide with the use of the [pyodide] commit marker after Run scipy tests as part of the Github Action CI pyodide/pyodide#4935 was merged recently, similar to scikit-learn's CI activities.

github-actions · 2024-09-05T18:23:47Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a7bb1e6. Link to the linter CI: here}

agriyakhetarpal · 2024-09-05T18:24:29Z

Note

UPDATE, 07/09/2024: this works now, this comment is not relevant.

This currently doesn't work and raises an obscure error (workflow logs) from within the Pyodide xbuildenv: ModuleNotFoundError: No module named 'js.process' for some reason with Pyodide xbuildenv versions 0.26.1/0.26.2/0.27.0a2 (tested in a PR on my fork). I am currently investigating this, and it should be resolvable, hopefully. cc: @hoodmane and @ryanking13

However, at the same time, creating a Pyodide virtual environment in a freshly spun-up Pyodide Docker container (and after removing a few conflicting, irrelevant conftest.py files temporarily):

Tap to expand logs

.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_get_visual_block_voting PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_get_visual_block_column_transformer PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_estimator_html_repr_pipeline PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_stacking_classifier[None] PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_stacking_classifier[final_estimator1] PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_stacking_regressor[None] PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_stacking_regressor[final_estimator1] PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_birch_duck_typing_meta PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_ovo_classifier_duck_typing_meta PASSED
.venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_estimator_html_repr.py::test_duck_typing_nested_estimator PASSED
# ...
# [...] (truncated output)
# ...
('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_supervised_y_2d] - DataConversionWarning not caught
XPASS .venv-pyodide/lib/python3.12/site-packages/sklearn/tests/test_common.py::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_supervised_y_2d] - DataConversionWarning not caught
XPASS .venv-pyodide/lib/python3.12/site-packages/sklearn/tests/test_common.py::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_supervised_y_2d] - DataConversionWarning not caught
XPASS .venv-pyodide/lib/python3.12/site-packages/sklearn/tests/test_common.py::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_supervised_y_2d] - DataConversionWarning not caught
XPASS .venv-pyodide/lib/python3.12/site-packages/sklearn/utils/tests/test_testing.py::test_create_memmap_backed_data - memmap not fully supported
FAILED .venv-pyodide/lib/python3.12/site-packages/sklearn/tests/test_build.py::test_openmp_parallelism_enabled - AssertionError: 
==================================== 1 failed, 28699 passed, 4846 skipped, 761 xfailed, 52 xpassed, 3541 warnings in 787.87s (0:13:07) =====================================
(.venv-pyodide) agriyakhetarpal@c1f89ae6ff12:/src$

and all tests pass (the failing one can be ignored by enabling the SKLEARN_SKIP_OPENMP_TEST environment variable, of course). This suggests to me that there's a missing file in the xbuildenv that exists in the Pyodide repository/Docker container for it. The only related piece of code I found was in pyodide/pyodide#3189.

However, I don't see why import sklearn as a Pythonic statement would require this? I have recently implemented similar changes for statsmodels and scikit-image, and for other projects, too – and their test suites run without a hitch.

agriyakhetarpal · 2024-09-05T20:55:53Z

I figured out the error 😅

2024-09-05T18:42:07.5543835Z # possible namespace for /home/runner/work/scikit-learn/scikit-learn/doc/js
2024-09-05T18:42:07.5545980Z import 'js' # <_frozen_importlib_external.NamespaceLoader object at 0xfa1be8>
2024-09-05T18:42:07.5625899Z PythonError: # zipimport: zlib available

For circumstances unexplained to me, I found that it was trying to import a JavaScript file/module from doc/js/ as if it were a Python one, which I had switched to, to avoid running tests from the sklearn/ directory (where things wouldn't be compiled). I've switched to a different directory, i.e., maint_tools/, and import sklearn now works.

I think we should fix this in Pyodide upstream – import js in Python shouldn't try to import something in js/ (but I don't know if we can avoid this cleanly).

The build now fails with a conftest. py-related error, which shouldn't be too difficult; I'll take a look at it either later in the day or tomorrow.

agriyakhetarpal · 2024-09-06T10:21:10Z

So, we will have tempita from Cython when cross-compiling, of course, but having it inside the sklearn folder is a bit troublesome: Cython is only used at build-time with Meson, and not during runtime – the from Cython import Tempita as tempita statement in sklearn/_build_utils/tempita.py breaks the sklearn import in Pyodide (I'm unsure how this has been working in the Node.js-based wrapper script). I did install Cython temporarily with micropip because it has a pure Python wheel to fix the import, but the better way would be to move the tempita CLI elsewhere so that it is accessed somewhere outside the sklearn/ folder (such that it does not break the sklearn import, but can still be retrieved by Meson and be used as a generator).

agriyakhetarpal · 2024-09-06T11:17:34Z

Just four errors are left to fix:

../.venv-pyodide/lib/python3.12/site-packages/sklearn/datasets/tests/test_openml.py::test_fetch_openml_verify_checksum[True-liac-arff] ERROR
../.venv-pyodide/lib/python3.12/site-packages/sklearn/datasets/tests/test_openml.py::test_fetch_openml_verify_checksum[False-liac-arff] ERROR
../.venv-pyodide/lib/python3.12/site-packages/sklearn/datasets/tests/test_openml.py::test_fetch_openml_verify_checksum[True-pandas] ERROR
../.venv-pyodide/lib/python3.12/site-packages/sklearn/datasets/tests/test_openml.py::test_fetch_openml_verify_checksum[False-pandas] ERROR

which are most likely coming from the difference between the Node.js runner and the Pyodide virtual environment, because the latter uses the conftest.py file in the root directory, while the former did not. Once done, I can start migrating the workflow to Azure Pipelines and clean up the changes.

agriyakhetarpal · 2024-09-06T21:24:27Z

I have triggered both the Pyodide venv and the Azure tests, and test_fetch_openml_verify_checksum is skipped for the latter. We now collect a slightly bigger number of tests ("collected 36856 items / 2 skipped"), whereas the Azure job collects/collected fewer items ("collected 36288 items / 1 skipped"). I pushed 382c8ba to remove an unused argument that was being perceived as a fixture – if that doesn't work, we could potentially skip the test to keep the same behaviour.

agriyakhetarpal

These changes are now ready for an initial review; thanks! I won't perform any further force-pushes here.

.github/workflows/emscripten.yml

meson.build

agriyakhetarpal · 2024-09-06T22:07:48Z

sklearn/datasets/tests/test_openml.py

@@ -1459,7 +1459,7 @@ def _mock_urlopen_raise(request, *args, **kwargs):
        (False, "pandas"),
    ],
 )
-def test_fetch_openml_verify_checksum(monkeypatch, as_frame, cache, tmpdir, parser):
+def test_fetch_openml_verify_checksum(monkeypatch, as_frame, tmpdir, parser):


Here, cache was a spurious argument that was unused throughout the file. It broke the Pyodide tests and returned an error upon collection because the interpreter assumed that this was a fixture.

agriyakhetarpal · 2025-03-25T15:51:35Z

I added the "upload to anaconda.org" logic, @agriyakhetarpal feel free to have a closer look. I tested it on my fork (build log) and it seems to work OK (except the actual upload because I didn't set up the secrets).

It looks good to me, thank you, Loïc!

@ogrisel do you have an opinion on whether we should pin actions by commit hash in the new Pyodide workflow, in particular pypa/cibuildwheel and scientific-python/upload-nightly-action?

I am not too sure what is the worst that could happen if a compromised Pyodide wheel was uploaded.

I guess if scientific-python/upload-nightly-action was compromised they could steal our anaconda.org secrets and upload vanilla Python (i.e. not Pyodide only) wheels to anaconda.org and attack users/projects that use our development wheel.

While I'm not on the scikit-learn board, the general advice I will have from my NumFOCUS Security Committee member hat is that all actions should be pinned, even if it makes updating them troublesome. Dependabot or Renovate usually help with it, and current limitations will be overcome when github/roadmap#592 lands. zizmor should also handle it, and will bicker about it if you use the --pedantic option.

I would agree with your assessment – this is another point of attack. If there's something that came out of the PyTorch nightly build dependency compromise, it is that there are definitely a bunch of end users of the nightly wheels, outside of CI.

lesteve · 2025-03-25T16:20:20Z

While I'm not on the scikit-learn board, the general advice I will have from my NumFOCUS Security Committee member hat is that all actions should be pinned, even if it makes updating them troublesome

I remember this was discussed in the context of SPEC 8 for example #29203 (comment) and the picture was not so clear-cut at the time.

SPEC 8 is saying "Pin GitHub Actions release workflows to their full release commit SHAs" which is why we are pinning the pypa/gh-action-pypi-publish by commit hash.

Now the question is whether the publishing of the Pyodide dev wheel to anaconda.org can be considered a "release workflow", maybe it is to be honest and we should at least pin pypa/cibuildwheel and scientific-python/upload-nightly-action.

ogrisel · 2025-03-25T17:27:23Z

@ogrisel do you have an opinion on whether we should pin actions by commit hash in the new Pyodide workflow, in particular pypa/cibuildwheel and scientific-python/upload-nightly-action?

I think we said in the past that we should pin by commit hash for package generating workflows only.

EDIT: "release critical" -> "package generating" workflows.

ogrisel

LGTM, but I would like a final pyodide build (except the upload step) working before giving the final approval to this PR.

Edit: indeed it's not possible to test the upload step as part of this PR.

.github/workflows/emscripten.yml

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

agriyakhetarpal · 2025-03-25T19:10:35Z

Thanks, @ogrisel for the review – I've addressed your comments in 4831d6d

Edit: I won't be able to get a working Anaconda.org upload of the wheels, though; I'm contributing through a fork. It should kick in once the change is pushed to main.

Edit two: the workslow is passing here: https://github.com/scikit-learn/scikit-learn/actions/runs/14068121558

lesteve · 2025-03-26T07:54:25Z

.github/workflows/emscripten.yml

+      build: ${{ steps.check_build_trigger.outputs.build }}
+    steps:
+      - name: Checkout scikit-learn
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2


Sorry to go back on forth on this but I personally would rather not pin official GitHub actions i.e. actions/..., for example see some reasons in #29203 (comment).

In other words, I would only pin third-party actions in package generating workflows.

Ok to only pin third-party actions, we probably need to write down our policy somewhere then.

Sorry for the back and forth.

ogrisel

Besides #29791 (review), LGTM.

lesteve · 2025-03-26T08:21:20Z

I have pushed the following changes:

use major version pinning for Github actions
revert cuda-ci cibuildwheel hash pin. I will open a separate PR about this. To be honest I am not too sure whether we need to pin it by commit hash. The created wheel is only used internally in the CI. You could as a user download download the github artifact and pip install the downloaded file but this is a bit far-fetched ...

lesteve · 2025-03-26T08:55:07Z

Let's merge this one, thanks @agriyakhetarpal!

lesteve · 2025-03-26T08:58:20Z

I have triggered manually a Pyodide wheel build, let's see what happens: https://github.com/scikit-learn/scikit-learn/actions/runs/14079229690

agriyakhetarpal · 2025-03-26T09:29:24Z

It failed because the token is empty? I'm not sure why that happened.

lesteve · 2025-03-26T10:58:46Z

Yeah me neither, I need to look into it closer ...

lesteve · 2025-03-26T11:05:10Z

Maybe #31078 would fix it by using the right Github environment to be able to access the secret.

Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

agriyakhetarpal added 10 commits September 4, 2024 22:23

Test Pyodide 0.27.0a2

22aa88f

Try to fix js.process ModuleNotFoundError

37884ac

Install Pyodide JS library for testing

d9d43fa

Skip tests that require a network (with pooch)

776e2c9

Fix Pyodide NPM installation

85d644d

Don't explicitly set xbuildenv version

527c151

Install pyodide-cli

5ba18f0

Import sklearn, try testing without --pyargs

d04b38e

Reinstall xbuildenv manually, install npm-pyodide in doc/

d6708a6

Add TODO note about Azure

2760c38

github-actions bot added the Build / CI label Sep 5, 2024

agriyakhetarpal added 2 commits September 6, 2024 00:07

Set PYTHONVERBOSE and PYTHONDEBUG for debugging

16267d7

Switch to maint_tools/ instead of doc/

961a9d4

agriyakhetarpal marked this pull request as draft September 5, 2024 20:44

agriyakhetarpal added 3 commits September 6, 2024 14:26

Clean up changes, use --pyargs

ee3228e

Fix a typo: paralell ➡️ parallel

383bf7e

Try to run Cython from its pure Python wheel

7efd95d

agriyakhetarpal force-pushed the updates-for-emscripten-ci branch from 3b30e89 to 7efd95d Compare September 6, 2024 09:30

Trigger [pyodide] wheel build, add Cython comment

cfae5ff

agriyakhetarpal added 3 commits September 6, 2024 20:58

Move tempita.py to root dir (for now)

3ae370f

Bump verbosity for [pyodide] test suite

f16a501

Remove spurious missing cache fixture

382c8ba

Trigger [pyodide] tests too

049ccca

agriyakhetarpal commented Sep 6, 2024

View reviewed changes

Rename [pyodide] wheel artifact

2fd9b50

lesteve and others added 3 commits March 25, 2025 17:49

[pyodide] Tweak name

6802778

Pin pypa/cibuildwheel and SPNW upload action

09ded37

Pin pypa/cibuildwheel for CUDA CI

74b7344

ogrisel reviewed Mar 25, 2025

View reviewed changes

.github/workflows/emscripten.yml Show resolved Hide resolved

.github/workflows/emscripten.yml Show resolved Hide resolved

.github/workflows/emscripten.yml Show resolved Hide resolved

Pin hashes for all actions in [pyodide] CI job

4831d6d

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

agriyakhetarpal requested review from lesteve and ogrisel March 25, 2025 19:10

lesteve reviewed Mar 26, 2025

View reviewed changes

ogrisel approved these changes Mar 26, 2025

View reviewed changes

lesteve added 3 commits March 26, 2025 09:08

Use major version pinning for official Github actions

8f81fa7

[azure parallel] [pyodide]

1db9ebd

revert CUDA CI pin [azure parallel] [pyodide]

a7bb1e6

lesteve merged commit 06f9656 into scikit-learn:main Mar 26, 2025
36 checks passed

agriyakhetarpal deleted the updates-for-emscripten-ci branch March 26, 2025 09:24

lesteve mentioned this pull request Mar 26, 2025

CI Use right environment in Pyodide wheel upload #31078

Merged

This was referenced Mar 26, 2025

DOC Use nightly WASM wheels for JupyterLite in the dev documentation #31085

Merged

Add JupyterLite-powered interactive galleries to the scikit-image documentation scikit-image/scikit-image#7644

Open

lucyleeow pushed a commit to lucyleeow/scikit-learn that referenced this pull request Apr 2, 2025

CI Move Pyodide CI from Azure to GitHub Actions (scikit-learn#29791)

76bead0

Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lesteve mentioned this pull request Apr 4, 2025

CI Fix pyodide wheel testing #31145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Move Pyodide CI from Azure to GitHub Actions #29791

CI Move Pyodide CI from Azure to GitHub Actions #29791

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal left a comment

agriyakhetarpal Sep 6, 2024

agriyakhetarpal commented Mar 25, 2025 •

edited

Loading

lesteve commented Mar 25, 2025

ogrisel commented Mar 25, 2025 •

edited

Loading

ogrisel left a comment •

edited

Loading

agriyakhetarpal commented Mar 25, 2025 •

edited

Loading

lesteve Mar 26, 2025 •

edited

Loading

ogrisel Mar 26, 2025

ogrisel left a comment

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025

agriyakhetarpal commented Mar 26, 2025

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025 •

edited

Loading

CI Move Pyodide CI from Azure to GitHub Actions #29791

CI Move Pyodide CI from Azure to GitHub Actions #29791

Conversation

agriyakhetarpal commented Sep 5, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Sep 5, 2024 • edited Loading

✔️ Linting Passed

agriyakhetarpal commented Sep 5, 2024 • edited Loading

agriyakhetarpal commented Sep 5, 2024 • edited Loading

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal commented Sep 6, 2024

agriyakhetarpal left a comment

Choose a reason for hiding this comment

agriyakhetarpal Sep 6, 2024

Choose a reason for hiding this comment

agriyakhetarpal commented Mar 25, 2025 • edited Loading

lesteve commented Mar 25, 2025

ogrisel commented Mar 25, 2025 • edited Loading

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

agriyakhetarpal commented Mar 25, 2025 • edited Loading

lesteve Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

ogrisel Mar 26, 2025

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025

agriyakhetarpal commented Mar 26, 2025

lesteve commented Mar 26, 2025

lesteve commented Mar 26, 2025 • edited Loading

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Sep 5, 2024 •

edited

Loading

agriyakhetarpal commented Mar 25, 2025 •

edited

Loading

ogrisel commented Mar 25, 2025 •

edited

Loading

ogrisel left a comment •

edited

Loading

agriyakhetarpal commented Mar 25, 2025 •

edited

Loading

lesteve Mar 26, 2025 •

edited

Loading

lesteve commented Mar 26, 2025 •

edited

Loading