Avoid bundling tests in wheels #31391

stefan6419846 · 2025-05-19T13:55:41Z

Describe the bug

The wheels currently include tests and test data. These usually are of no additional value outside of the source distributions and thus just bloat the distribution and complicate reviews. For this reasons, I recommend excluding them from future wheels.

This matches the official recommendation for Python packaging as well (see https://packaging.python.org/en/latest/discussions/package-formats/#what-is-a-wheel):

Wheels are meant to contain exactly what is to be installed, and nothing more. In particular, wheels should never include tests and documentation, while sdists commonly do.

Steps/Code to Reproduce

Download the current wheels and look for sklearn/datasets/tests

Expected Results

The directory is absent.

Actual Results

The directory exists.

Versions

1.6.1

The text was updated successfully, but these errors were encountered:

lesteve · 2025-05-19T15:45:40Z

Could you add a bit more details why this is causing you any issue? Do you have an environment where additional size really matters? When you say it "complicate reviews", I guess you mean some kind of compliance review (basing this guess on a few issues you recently opened)?

It looks like you care about this and have opened issues in many projects so I would be curious to know more.

As a maintainer, I know I have found it convenient in the past to be able to run the tests against an installed wheel. For example upload to test.pypi.org install from test.pypi.org and double-check everything was fine by running pytest --pyargs <my-package>. Not sure how you would run the tests against an installed wheel if we were to remove the tests from the wheel.

I quickly double-checked and it seems like numpy and scipy also still have tests in their wheels, although there has been some work for making it easier to produce wheels without tests: numpy/numpy#25737.

I am marking this as "Needs Decision" for now.

stefan6419846 · 2025-05-19T16:12:33Z

Could you add a bit more details why this is causing you any issue? Do you have an environment where additional size really matters? When you say it "complicate reviews", I guess you mean some kind of compliance review (basing this guess on a few issues you recently opened)?

I do indeed care about this, yes. In this specific case, I am working on license compliance reviews for possibly integrating this into our (commercial) products. This involves a semi-automated process of basically reviewing every single file.

For packages with plain Python code and good test coverage, tests tend to be more than 50% of the actual source code. From time to time, especially test data seems to have dubious origins or other issues which complicate clean distribution. Avoiding bundling the tests inside the wheels helps cutting down the amount of work and the storage required for this. At the same time, this aligns with the PyPA recommendations on wheel builds. Usually, I as the final user do not want to run the library tests as this is part of the upstream process, although there are no warranties of course - I usually have my own integration and unit tests ensuring that updates do not break my code.

Please note that the aforementioned statistics do not hold true for scikit-learn directly. Some rough statistics show 459 test files (6 MB) and 517 regular files (37 MB), although you should take this with some caution due to additional files being created by unpacking archives for example. scipy and numpy are further candidates where I did not yet open a corresponding issue - their (unpacked) numbers are 1181 test files out of 1948 total files and 514 test files out of 1022 files in the current sample.

betatim · 2025-05-20T12:13:12Z

We (e.g. the cuml project) rely on the bundled tests and find it useful that all you have to do is to install the version of scikit-learn you want to test against. We could of course checkout the right version from the git repository, etc but it introduces several points of failure where you could end up with tests that are from a different version than the version you have installed.

adrinjalali · 2025-05-20T13:05:19Z

Same argument as #31390 (comment), no action here should be done IMO pending that discussion.

adrinjalali · 2025-06-04T13:35:13Z

I'd say since tests are actually used in our and third party devs CI, we don't have plans to remove them at the moment.

stefan6419846 added Bug Needs Triage Issue requires triage labels May 19, 2025

stefan6419846 mentioned this issue May 19, 2025

Contains code not allowed for commercial use #31390

Open

lesteve added Needs Decision Requires decision and removed Bug Needs Triage Issue requires triage labels May 19, 2025

lesteve mentioned this issue May 20, 2025

Avoid bundling tests in wheels joblib/joblib#1723

Open

adrinjalali closed this as completed Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid bundling tests in wheels #31391

Avoid bundling tests in wheels #31391

stefan6419846 commented May 19, 2025

lesteve commented May 19, 2025 •

edited

Loading

Uh oh!

stefan6419846 commented May 19, 2025 •

edited

Loading

Uh oh!

betatim commented May 20, 2025 •

edited

Loading

Uh oh!

adrinjalali commented May 20, 2025

Uh oh!

adrinjalali commented Jun 4, 2025

Uh oh!

Uh oh!

Avoid bundling tests in wheels #31391

Avoid bundling tests in wheels #31391

Comments

stefan6419846 commented May 19, 2025

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

lesteve commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan6419846 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

betatim commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented May 20, 2025

Uh oh!

adrinjalali commented Jun 4, 2025

Uh oh!

lesteve commented May 19, 2025 •

edited

Loading

stefan6419846 commented May 19, 2025 •

edited

Loading

betatim commented May 20, 2025 •

edited

Loading