-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Correctly document linked libraries #27559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, this is interesting. This basically makes it impossible for people to develop private code using scikit-learn as long as cc @scikit-learn/core-devs |
IANAL, but: GCC has the runtime exception which should reduce the general risk (see copyright header as well): https://www.gnu.org/licenses/gcc-exception-3.1.html Nevertheless, if this is clearly documented on the scikit-learn side, this should at least resolve basic confusion. |
@stefan6419846 Could you provide the way you found them?
Actually, it means that you need to build scikit-learn from source using an OpenMP that is not GPL because we don't bundle within the package but only in the wheel. I assume that the only way that we can work around is to always use llvm compilers with the llvm-openmp as we already do for the MacOS wheels. The licence is Apache-2 in this case. |
I used https://github.com/stefan6419846/license_tools, a custom wrapper around https://github.com/nexB/scancode-toolkit/ |
@stefan6419846 Thanks. I assume that we should be running such tools and have a proper LICENCE file integrated to the wheels. |
In theory you shouldn't need to run these tools regularly, but perform an initial complete review of the current code base for all external stuff to document it appropriately (and in which cases it is shipped in the official distributions) - this can be assisted by corresponding scanning tools. Future checks usually can be subject to a general pull request review process, backed by corresponding contribution docs (when and how to include new external code, including indirect dependencies, how to ensure license compatibility ...) |
I don't believe that this is true. Still, it would be good from our side to document things better. |
License compliance does not really allow for generalization and IANAL, but yes, it depends on how you use/distribute your applications and what the law department considers appropriate. In general, the GPL being a strict copyleft license can be an issue and some "weaker" license might be desired (like Apache-2.0 with its weak copyleft effect), but internal use without distribution or SaaS-based usage tends to be fine at least. |
As already pointed out, the GCC RUNTIME LIBRARY EXCEPTION states
libgomp has this exception and is only included in our binaries (wheels) when compiling via gcc, isn't it? IANAL, I don't see a problem here. And I also don't know if it is a good idea to add anything to the docs. License scanning, on the other side, is usually a good idea 😏 |
This depends on the general perspective you want to take. Yes, in general FOSS and especially the liability/warranty clauses of most licenses do not require anyone to provide such information. They can rather serve as some basic indication of the current licensing situation and provide some short hints regarding possible issues, while indicating that someone might be aware of the possible implications. Given the liability clauses above, I will always have to check for correctness of the statements as well to avoid hidden risks (studies have shown that there are quite some projects which do not correctly document "hidden" licenses). During such a process, I stumbled upon the current documentation limitations and decided to file this issue to further evaluate what a suitable solution could look like. As some examples, this is how scipy or opencv-python currently handle this: https://github.com/scipy/scipy/blob/main/LICENSES_bundled.txt https://github.com/opencv/opencv-python/blob/4.x/LICENSE-3RD-PARTY.txt |
Scipy really bundles/vendors several whole libraries, i.e., they are included in the scipy source code. The only thing we vendor is liblinear and libsvm, and then a few smaller code snippets like in If you think a |
I think something like this is required. Assuming I've identified the correct licenses for liblinear and pprint and their code is included in the binary wheels, their licenses, BSD 3-Clause and PSF require that their copyright notices and licenses are supplied with the binaries that contain them. As far as I can tell the scikit-learn wheels do not currently do this. Regarding, libgomp, although the Runtime Exception clause applies to the scikit-learn code, I believe libgomp itself is distributed under the terms of the GPL v3, i.e., the source code from which it was it built should be provided or should be linked to in some way. See the second paragraph of the section entitled I use a proprietary compiler toolchain without any parts of GCC to compile my program, and link it with libstdc++ in the gcc-exception-3.1-faq. (libstdc++ is also released under the GCC Runtime Library Exception). Numpy and scipy have had a similar issue with libgfortran in the past which is bundled in their binary wheels and is also released under the same license as libgomp. When the numpy wheels are built, an OS specific text file containing the licenses for all the bundled dependencies (including libgfortran) is now appended to the LICENSE.txt file included in the wheel. The entry for libgfortran in the final LICENSE.txt file contains a link to the libgfortran source code, although not I think, the exact version from which it was built. |
@thomasjpfan Could you contribute something similar to numpy/numpy#20102 concerning the licenses? |
Yea, I'll contribute something like numpy/numpy#20102 for scikit-learn. |
Anything to contribute here ? |
Describe the issue linked to the documentation
When downloading the current wheel for
scikit-learn==1.3.1
, the metadata tell me that the package is subject to the terms of BSD-3-Clause. Unfortunately, this only applies to the package itself. Skimming through the distributed files, there are at least two additional cases:libgomp-a34b3233.so.1.0.0
, subject to GPL-3.0-or-later WITH Runtime exception: https://github.com/gcc-mirror/gcc/blob/master/libgomp/libgomp.hSuggest a potential alternative/fix
It would be great if a full list of external modules shipped within scikit-learn wheels and their copyright information would be provided to detect possible license conflicts early.
The text was updated successfully, but these errors were encountered: