-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Avoid bundling tests in wheels #31391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you add a bit more details why this is causing you any issue? Do you have an environment where additional size really matters? When you say it "complicate reviews", I guess you mean some kind of compliance review (basing this guess on a few issues you recently opened)? It looks like you care about this and have opened issues in many projects so I would be curious to know more. As a maintainer, I know I have found it convenient in the past to be able to run the tests against an installed wheel. For example upload to test.pypi.org install from test.pypi.org and double-check everything was fine by running I quickly double-checked and it seems like numpy and scipy also still have tests in their wheels, although there has been some work for making it easier to produce wheels without tests: numpy/numpy#25737. I am marking this as "Needs Decision" for now. |
I do indeed care about this, yes. In this specific case, I am working on license compliance reviews for possibly integrating this into our (commercial) products. This involves a semi-automated process of basically reviewing every single file. For packages with plain Python code and good test coverage, tests tend to be more than 50% of the actual source code. From time to time, especially test data seems to have dubious origins or other issues which complicate clean distribution. Avoiding bundling the tests inside the wheels helps cutting down the amount of work and the storage required for this. At the same time, this aligns with the PyPA recommendations on wheel builds. Usually, I as the final user do not want to run the library tests as this is part of the upstream process, although there are no warranties of course - I usually have my own integration and unit tests ensuring that updates do not break my code. Please note that the aforementioned statistics do not hold true for scikit-learn directly. Some rough statistics show 459 test files (6 MB) and 517 regular files (37 MB), although you should take this with some caution due to additional files being created by unpacking archives for example. scipy and numpy are further candidates where I did not yet open a corresponding issue - their (unpacked) numbers are 1181 test files out of 1948 total files and 514 test files out of 1022 files in the current sample. |
We (e.g. the cuml project) rely on the bundled tests and find it useful that all you have to do is to install the version of scikit-learn you want to test against. We could of course checkout the right version from the git repository, etc but it introduces several points of failure where you could end up with tests that are from a different version than the version you have installed. |
Same argument as #31390 (comment), no action here should be done IMO pending that discussion. |
I'd say since tests are actually used in our and third party devs CI, we don't have plans to remove them at the moment. |
Describe the bug
The wheels currently include tests and test data. These usually are of no additional value outside of the source distributions and thus just bloat the distribution and complicate reviews. For this reasons, I recommend excluding them from future wheels.
This matches the official recommendation for Python packaging as well (see https://packaging.python.org/en/latest/discussions/package-formats/#what-is-a-wheel):
Steps/Code to Reproduce
Download the current wheels and look for
sklearn/datasets/tests
Expected Results
The directory is absent.
Actual Results
The directory exists.
Versions
The text was updated successfully, but these errors were encountered: