Skip to content

RFC Consistency for meta infos in files (license, encoding, authors, ..) #20813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lorentzenchr opened this issue Aug 23, 2021 · 37 comments
Closed
Labels
Needs Decision Requires decision

Comments

@lorentzenchr
Copy link
Member

There are several pieces of information that some files contain, others do not, examples are:

  • # -*- coding: utf-8 -*- at the beginning of a file
  • License information, eg # License: BSD 3 clause
  • List of (some main) authors for that file, # Authors:

For the following reasons, I'd like to remove the encoding info, and have a consistent policy for the other 2 points:

  • As a (newish) contributor, it's hard to figure out what is expected.
  • Having those infos consistent in all files might help a tiny little bit to generate trust in the production-readiness of scikit-learn.

I haven't figured out a convincing policy for authors, maybe leave it as is.

Note that changes of those lines of code shouldn't produce merge conflicts, cross referencing #11336.

@rth
Copy link
Member

rth commented Aug 23, 2021

I was also somewhat skeptical about the need for # -*- coding: utf-8 -*- until I learned that some editors rely on it to determine encoding (https://stackoverflow.com/a/14083123/1791279). I imagine because most of the text files are ascii only, it could very well happen that some editors would open it as latin1 or something else, particularly on Windows.

As to the license header, it looks like there is no clear consensus but at the same time if parts of scikit-learn get vendored somewhere, it become very difficult to realize what the license was without this header line. So I would be +1 to consistently add it everywhere. Though it looks like neither pandas not numpy do this.

@thomasjpfan
Copy link
Member

I like having a license header everywhere to be consistent. Since everything is licensed under https://github.com/scikit-learn/scikit-learn/blob/main/COPYING do we add the following everywhere?

# Authors: The scikit-learn developers.
# License: BSD 3 clause

@lorentzenchr
Copy link
Member Author

+1 for placing the license everywhere.

A little argument in favor of # Authors: The scikit-learn developers would help my indecisiveness. Would we then remove all the other authors?

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 24, 2021

Would we then remove all the other authors?

I was thinking to leave the current authors in place and add this new # Authors: The scikit-learn developers for files do not have a license.

Now that I am looking at other python code bases: CPython, numpy, scipy, etc, they do not have a license everywhere. I think I can go either way with this as long as we are consistent. Would it be okay to remove the license on top of files and have it all other the "root" license? It is the same license and the authors are still "The scikit-learn developers".

The generic "____ developers" term is used in numpy's license, scipy's license.

@DimitriPapadopoulos
Copy link
Contributor

I had already removed all (?) occurrences of # -*- coding: utf-8 -*- in #21260. I don't know why some of them had been left out, or have been added.

As for editors that would rely on it to switch to UTF-8 when editing Python files, I would argue they are broken. I've never seen such an editor myself.

@betatim
Copy link
Member

betatim commented May 31, 2023

To add a datapoint from JupyterHub: we use "The JupyterHub authors" in a lot of places, there is no explicit mention of individual authors in the files themselves and I don't think the use of the encoding line is widespread/used anywhere.

The COPYING.md explains both the license and (at the very bottom) the shared copyright model used https://github.com/jupyterhub/jupyterhub/blob/main/COPYING.md (it also gives recommendation for what individual authors should do if they want to somehow make a note of their copyright).

In general I like having the statement on the shared copyright model because it makes it clear that you don't have to transfer your copyright, sign a contributor license agreement, etc to contribute. It is also useful to point to if ever someone would want to change the license (as we'd have to contact every single author to ask them to relicense their contributions).

@jjerphan
Copy link
Member

jjerphan commented Oct 5, 2023

I would add a mention of the BSD license, and of scikit-learn in the headers of the sources for the reason given by Roman in #20813 (comment) and of "scikit-learn developers" for the "Authors" mention.

I find that having lists of the original authors is also useful, though public git histories make those lists irrelevant.

jjerphan added a commit to jjerphan/scikit-learn that referenced this issue Oct 29, 2023
See: scikit-learn#20813

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@lorentzenchr
Copy link
Member Author

Neither numpy, scipy nor pandas have authors in source files nor a repeated license header. All are licensed under BSD-3-clause which is stated in the LICENSE.txt file. I would really like to get rid of those names, we have git for the historians.

@jeremiedbb
Copy link
Member

On the first hand, there's no obligation to have the license and copyright in every single file. Having it only in the root folder is acceptable.

On the other hand, having it in each file can make it easier for third party projects that vendor parts of scikit-learn. They just have to take them. Note that I don't think the current formatting would be valid: I think it would require a Copyright field.

This brings the question: do we want to be compliant to some standard regarding licensing ? There's for instance the SPDX project hosted by the linux foundation. SPDX is not just about licensing, but there's a Licensing part. The goal would be to have human and machine readable headers, and make scikit-learn files trivial to vendor. It could also help us move forward with #27559.

Then there are tools to convert files of a project in order to be compliant and tools to certify that files are compliant. I tested reuse which is both a converter and a linter, compatible with SPDX. I ran

reuse annotate --exclude-year --skip-existing --license BSD-3-Clause --copyright="The scikit-learn developers" path/to/file

I produces this kind of header

# SPDX-FileCopyrightText: The scikit-learn developers
#
# SPDX-License-Identifier: BSD-3-Clause

The SPDX-FileCopyrightText: can have other styles, like just Copyright, it's a flag. But I don't think that SPDX-License-Identifier can have other styles.

Then reuse can be used as a linter, even as a precommit hook.

Side note; it doesn't prevent us to first remove all existing headers because reuse can't replace them automatically because they're not headers sometimes or not in a parsable format.

What's your opinion on this ?

@lorentzenchr
Copy link
Member Author

What's your opinion on this ?

My first take away is that we still have (kind of a) consensus to (first) remove authors and license in the files.

Then to your proposal, I would not over-engineer this. Let's wait until a 3rd party project asks for it. Do we know of any project that vendors (parts of) scikit-learn?

@jeremiedbb
Copy link
Member

imbalanced-learn vendors some files of scikit-learn. They just copy-paste them. They don't include sklearn's copyright anywhere. Some of those files don't have any header. Ping @glemaitre for confirmation.

@jeremiedbb
Copy link
Member

My first take away is that we still have (kind of a) consensus to (first) remove authors and license in the files.

I don't see a consensus from this discussion to remove them. I see 1 or 2 in favor of removing, 1 or 2 in favor of keeping them but uniformized and 1 or 2 unclear 😄

I'd tend to remove them as well unless following the standard is important for some reason I'm not aware about, hence my question.

But I agree that whatever the decision, we can start by removing them.

@thomasjpfan
Copy link
Member

But I agree that whatever the decision, we can start by removing them.

I'm not a legal expert, but can we do this? Our root license is under "The scikit-learn developers.":

https://github.com/scikit-learn/scikit-learn/blob/57a72222da091cb7fb4e35c989f9d93e96813fd3/COPYING#L3C25-L3C53

But the individual files have the authors listed:

# Authors: Gilles Louppe <g.louppe@gmail.com>
# Peter Prettenhofer <peter.prettenhofer@gmail.com>
# Brian Holt <bdholt1@gmail.com>
# Noel Dawe <noel@dawe.me>
# Satrajit Gosh <satrajit.ghosh@gmail.com>
# Lars Buitinck
# Arnaud Joly <arnaud.v.joly@gmail.com>
# Joel Nothman <joel.nothman@gmail.com>
# Fares Hedayati <fares.hedayati@gmail.com>
# Jacob Schreiber <jmschreiber91@gmail.com>
# Nelson Liu <nelson@nelsonliu.me>

@jeremiedbb
Copy link
Member

From what I read here and there: Since we're releasing under BSD license, we don't necessarily have to worry too much about copyright ownership. This license means the copyright owner is effectively giving away their code, so it's unlikely that it can cause us any trouble. However I'm not a legal expert either, it would be interesting to have a concrete legal advice.

@stefan6419846
Copy link

IANAL, but coming from the license compliance side, the usual recommendation I have met is to always include all copyright lines found in the original source code along with their corresponding license statements - usually regardless of the actual license. Applying this to the packaging side: I would not remove them unless being completely sure about this. (There have been recent rumors about this in the Redis community as well when they removed BSD license headers due to re-licensing, see PR 13157 in https://github.com/redis/redis for example.)

@lorentzenchr
Copy link
Member Author

We do not want to change the license at all.
One copyright or license file is enough.
I want to remove the "authors" and other meta info in individual files. There is nothing in the BSD-3 that would forbid this.

@betatim
Copy link
Member

betatim commented Apr 12, 2024

I'm in favour of removing the list of individual authors from files (and replacing it with "The scikit-learn authors"). Only caveat: if someone can point to some specific legal opinion that says "you definitely can not do this" (not just "you might not be able to do this").

Basically, the ideal end goal for me would be two lines in each file: one stating authors, one stating the license (e.g. what reuse seems to do, if we can remove the blank line :D)

The main motivator for me is that the list of individual authors is probably out of date and hence not super useful for anything other than some form of "giving credit". The credit function was probably useful 5 years ago, but today I wonder if we still need it.

I'm not sure I understand what you mean @thomasjpfan regarding the fact that we list individual authors (#20813 (comment)). If we ever wanted to change the license we'd have to contact all the individual authors to get them to agree. For that we'd have to use git's history though, because I doubt that people were diligent in updating the manual list (e.g. when making small tweaks).

@jeremiedbb
Copy link
Member

jeremiedbb commented Apr 12, 2024

I'm not a legal expert, but can we do this?

I hope. Are we allowed to modify or remove code contributed by some author without his explicit consent ? I can't imagine otherwise, it's done every single day. The author line being 1 line of code, I don't see how it would be different. It's not that line that can will help a contributor to prove what part of the code was written by him. It's other tools like git or github history.

And let's keep in mind that it's not the author line that gives him rights on his code contribution. He holds this right no matter what we write or not write.

@jeremiedbb
Copy link
Member

Can we get legal advice via Open Collective ? Maybe overkill for that matter but we could settle once and for all :)

@thomasjpfan
Copy link
Member

I'm not sure I understand what you mean @thomasjpfan regarding the fact that we list individual authors

From my understanding of 3-Clause BSD, the copyright holder must be preserved. Concretely, if we remove the list of authors in _tree.pyx, then their names must be preserved in the root license.

If the listed authors are part of the "The scikit-learn developers.", then it could be okay. But I am not sure.

@lorentzenchr
Copy link
Member Author

https://www.linuxfoundation.org/blog/blog/copyright-notices-in-open-source-software-projects is insightful.

@lorentzenchr
Copy link
Member Author

A minimal consensus seems to be @thomasjpfan's proposal in #20813 (comment) and put

# Authors: The scikit-learn developers
# License: BSD 3 clause

in every file.

@betatim
Copy link
Member

betatim commented Apr 19, 2024

If the listed authors are part of the "The scikit-learn developers.", then it could be okay. But I am not sure.

My assumption is that "The scikit-learn developers" includes all people who have edited the source code (at some point in time).

@jeremiedbb
Copy link
Member

In the article linked in #20813 (comment), the recommandation is to not remove the existing authors:

Don’t change someone else’s copyright notice without their permission

You should not change or remove someone else’s copyright notice unless they
have expressly (in writing) permitted you to do so. 

@lorentzenchr
Copy link
Member Author

I have the really strong opinion to remove all names. I also don’t see a larger risk in doing so. We do not remove the copyright holder, just a line of source code.
The alternative is that I (or someone else) insist on my name in every file that I ever touched, and I don’t like that.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Apr 22, 2024

I read a bit more on US copyright laws despite not being a lawyer and not knowing which (country’s) law is best cited here. I found what is a copyright notice

The copyright notice generally consists of the symbol or word “copyright (or copr.),” the name of the copyright owner, and the year of first publication, e.g., ©2008 John Doe. While use of a copyright notice was once required as a condition of copyright protection, it is now optional.

So I would conclude that we only have one single copyright notice in the COPYING file. Additionally, we have a lot of non-systematic mentions of (some of the) authors of some files scattered over files which are not copyright notices. An author keeps being an author even without such author statements.

Finally, what do we fear? I am trying to be fair to the community and credit each and every contribution. And I consider those rather arbitrary and scattered author statements unfair. From Rich Bowen

At the Apache Software Foundation, we had this debate a decade ago, and decided that author tags in source code were anti-community, and thus discouraged.

@adrinjalali
Copy link
Member

I feel that the authors on top of the files are extremely inaccurate. They by far don't include people who have actually contributed to the file, and at times they have people who originally had something to do with the file, but in the meantime we've moved on in the implementation and their code is no longer there. So it's a very fuzzy inaccurate thing to have them up there.

Maybe the best approximation is to look at the git blame, but that also only shows the last person touching the line, and that's not really mirroring the authorship.

So I'm more in favor of removing names all together. Right now, hypothetically, even if we want to "ask for permission from authors of a file", we have pretty much no easy way to do so, and the names on top of the file are NOT the people whose permission we need. So it's pretty useless to have them up there in the first place.

@jjerphan
Copy link
Member

I am also in disfavor author tags in headers.

I think those were useful at a time or in context where authors were only reachable by mail or where IP had to be protected.

To me, those are often rather out-of-date pieces of information that we probably do not want to maintain.

There are many public authoritative proofs of authorship, clean git history with verified commits (like scikit-learn's) being one of them.

If one is curious about an implementation and wants to contact the author, or if one were to sue a project for copyright infringement, one probably would use them.

In any case, this message is also an authorization for removing my name tag in headers.

@adrinjalali
Copy link
Member

For the record, I tried to extract all authors and deduplicate them, and here's what I found:

Adam Kleczewski
Adrin Jalali <adrin.jalali@gmail.com>
Albert Thomas <albert.thomas@telecom-paristech.fr>
Alexander Fabisch  -- <afabisch@informatik.uni-bremen.de>
Alexandre Gramfort <alexandre.gramfort@telecom-paristech.fr>
Alexandre T. Passos
Amit Aides <amitibo@tx.technion.ac.il>
Andreas Müller
Andreas Bjerre-Nielsen
Andrew Knyazev <Andrew.Knyazev@ucdenver.edu>
Anthony Di Franco (projected gradient, Python and NumPy port)
Aric Hagberg <hagberg@lanl.gov>
Arnaud Fouchet <foucheta@gmail.com>
Arnaud Joly <a.joly@ulg.ac.be>
Arthur Mensch <arthur.mensch@m4x.org>
Arturo Amor <david-arturo.amor-quiroz@inria.fr>
Arya McCarthy <arya@jhu.edu>
Ashim Bhattarai <ashimb9@gmail.com>
Balazs Kegl <balazs.kegl@gmail.com>
Bernardo Stein <bernardovstein@gmail.com>
Bertrand Thirion, Alexandre Gramfort, Denis A. Engemann
Brian Cheung
Brian Holt <bdholt1@gmail.com>
Chih-Jen Linn (original projected gradient NMF implementation)
Chirag Nagpal
Chris Rivera <chris.richard.rivera@gmail.com>
Christian Lorentzen <lorentzen.ch@gmail.com>
Christian Osendorfer <osendorf@gmail.com>
Christopher Moody <chrisemoody@gmail.com>
Christos Aridas
Chyi-Kwei Yau <chyikwei.yau@gmail.com>
Clay Woolam <clay@woolam.org>
Clemens Brunner
Conrad Lee <conradlee@gmail.com>
Dan Blanchard <dblanchard@ets.org>
Daniel Lopez-Sanchez (TensorSketch) <lope@usal.es>
Danny Sullivan <dbsullivan23@gmail.com>
David Dale <dale.david@mail.ru>
Denis A. Engemann <denis-alexander.engemann@inria.fr>
Diego Molla <dmolla-aliod@gmail.com>
Edouard Duchesnay <edouard.duchesnay@cea.fr>
Emanuele Olivetti
Emmanuelle Gouillart <emmanuelle.gouillart@nsup.org>
Eric Chang <ericchang2017@u.northwestern.edu>
Eric Martin <eric@ericmart.in>
Eustache Diemert <eustache@diemert.fr>
Fabian Pedregosa <fpedregosa@acm.org>
Fares Hedayati <fares.hedayati@gmail.com>
@FedericoV <https://github.com/FedericoV/>
Florian Wilhelm <florian.wilhelm@gmail.com>
Fred L. Drake, Jr. <fdrake@acm.org> (built-in CPython pprint module)
Gabriel Synnaeve
Gael Varoquaux
Gilles Louppe <g.louppe@gmail.com>,
Giorgio Patrini <giorgio.patrini@anu.edu.au>
Giuseppe Vettigli <vettigli@gmail.com>
Gordon Walsh <gordon.p.walsh@gmail.com>
Gregory Stupp <stuppie@gmail.com>
Guillaume Lemaitre <guillaume.lemaitre@inria.fr>
Hamzeh Alsalhi <ha258@cornell.edu>
Hanmin Qin <qinhanmin2005@sina.com>
Henry Lin <hlin117@gmail.com>
Hugo Bowne-Anderson <hugobowne@gmail.com>
Issam H. Laradji <issam.laradji@gmail.com>
Jacob Schreiber
Jake Vanderplas <vanderplas@astro.washington.edu>
James Ashton Nichols <james.ashton.nichols@gmail.com>
James Bergstra <james.bergstra@umontreal.ca>
Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>
Jan Schlueter <scikit-learn@jan-schlueter.de>
Jaques Grobler <jaques.grobler@inria.fr>
Jatin Shah <jatindshah@gmail.com>
Jiyuan Qian <jq401@nyu.edu>
Joan Massich <mailsik@gmail.com>
Jochen Wersdorfer <jochen@wersdoerfer.de>
Jochen Wersdörfer <jochen@wersdoerfer.de>
Joel Nothman <joel.nothman@gmail.com>
Johannes Schönberger
John Chiotellis <ioannis.chiotellis@in.tum.de>
John Healy <jchealy@gmail.com>
Joly Arnaud <arnaud.v.joly@gmail.com>
Jona Sassenhagen
Joris Van den Bossche <jorisvandenbossche@gmail.com>
Justin Vincent
Karan Desai <karandesai281196@gmail.com>
Katrina Ni <https://github.com/nilichen>
Kemal Eren <kemal@kemaleren.com>
Kian Ho <hui.kian.ho@gmail.com>
Kornel Kielczewski -- <kornel.k@plusnet.pl>
Kushan <kushansharma1@gmail.com>
Kyle Kastner <kastnerkyle@gmail.com>
Lars Buitinck
L. Buitinck
Leandro Hermida <hermidal@cs.umd.edu>
Leland McInnes <leland.mcinnes@gmail.com>
Li Li <aiki.nogard@gmail.com>
Lucy Liu
Maheshakya Wijewardena <maheshakya.10@cse.mrt.ac.lk>
Malte Londschien
Manoj Kumar mks542@nyu.edu
Maria Telenczuk    <https://github.com/maikia>
Martin Billinger
Martino Sorbaro <martino.sorbaro@ed.ac.uk>
Maryan Morel <maryan.morel@polytechnique.edu>
Mathew Kallada
Mathieu Blondel
Matteo Visconti di Oleggio Castello 2014
Matthew D. Hoffman (original onlineldavb implementation)
Matthieu Perrot
Matt Terry <matt.terry@gmail.com>
Meekail Zain <zainmeekail@gmail.com>
Michael Becker <mike@beckerfuffle.com>
Michael Eickenberg <michael.eickenberg@nsup.org>
Michael Williamson
Michal Karbownik <michakarbownik@gmail.com>
Michal Krawczyk <mkrwczyk.1@gmail.com>
Mohamed Ali Jamaoui <m.ali.jamaoui@gmail.com>
Narine Kokhlikyan <narine@slice.com>
Nelle Varoquaux <nelle.varoquaux@gmail.com>
Nelson Liu <nelson@nelsonliu.me>
Nick Travers <nickt@squareup.com>
Nicolas Goix <nicolas.goix@telecom-paristech.fr>
Nicolas Hug (scikit-learn specific changes)
Nicolas Tresegnie <nicolas.tresegnie@gmail.com>
Nikolay Mayorov <n59_ru@hotmail.com>
Noel Dawe <noel@dawe.me>
Oliver Rausch <rauscho@ethz.ch>
Olivier Grisel <olivier.grisel@ensta.org>,
Paolo Losi
Patrice Becker  <beckerp@ethz.ch>
Pedro Morales <part.morales@gmail.com>
Peter Prettenhofer <peter.prettenhofer@gmail.com>
Pharuj Rajborirug <pharuj.ra@kmitl.ac.th>
Philippe Gervais <philippe.gervais@inria.fr>
Phil Roth <mr.phil.roth@gmail.com>
Pierre Lafaye de Micheaux
Pietro Berkes,
Raghav RV <rvraghav93@gmail.com>
Ramil Nugmanov <stsouko@live.ru>
Reuben Fletcher-Costin <reuben.fletchercostin@gmail.com>
Robert Layton <robertlayton@gmail.com>
Robert McGibbon
Rob Zinkov <rob at zinkov dot com>
Rodion Martynov <marrodion@gmail.com>
Roman Sinayev <roman.sinayev@gmail.com>
Roman Yurchak <rth.yurchak@gmail.com>
Ron Weiss <ronweiss@gmail.com>, Gael Varoquaux
Satrajit Gosh <satrajit.ghosh@gmail.com>
Saurabh Jha <saurabh.jhaa@gmail.com>
Scott White
Sebastian Raschka <se.raschka@gmail.com>,
Sergey Feldman <sergeyfeldman@gmail.com>
Sergul Aydore 2017
Shane Grigsby <refuge@rocktalus.com>
Simon Wu <s8wu@uwaterloo.ca>
Stefan van der Walt
Steve Astels <sastels@gmail.com>
Sylvain Marie <sylvain.marie@schneider-electric.com>
Thierry Guillemot <thierry.guillemot.work@gmail.com>
Thomas J Fan <thomasjpfan@gmail.com>
Thomas Rueckstiess <ruecksti@in.tum.de>
Thomas Unterthiner
Tim Head <betatim@gmail.com>
Tom Dupre la Tour <tom.dupre-la-tour@m4x.org>
Trevor Stephens <trev.stephens@gmail.com>
Tyler Lanigan <tylerlanigan@gmail.com>
Utkarsh Upadhyay <mail@musicallyut.in>
Uwe F Mayer <uwe_f_mayer@yahoo.com>
Vincent Dubourg <vincent.dubourg@gmail.com>
Vincent Michel <vincent.michel@inria.fr>
Virgile Fritsch <virgile.fritsch@inria.fr>
Vlad Niculae
V. Michel
Wei LI <kuantkid@gmail.com>
Wei Xue <xuewei4d@gmail.com>
Wenhao Zhang <wenhaoz@ucla.edu>
William de Vazelhes <wdevazelhes@gmail.com>
William Mill (bill@billmill.org)
Yann N. Dauphin <dauphiya@iro.umontreal.ca>
Yehuda Finkelstein <yehudaf@tx.technion.ac.il>
Yoshihiro Uchida <nimbus1after2a1sun7shower@gmail.com>

It's clearly far from the actual authors we have on the files. You can try the git log for instance like this:

git ls-files sklearn/linear_model | xargs -n 1 git --no-pager log --pretty=format:"%an" -- | sort | uniq

which gives:

Abdulelah S. Al Mesfer
Adam Li
Adam Midvidy
adienes
Adrian Trujillo Duron
Adrin Jalali
Aidan Fitzgerald
aishgrt1
aivision2020
akshayah3
Albert Thomas
Albert Villanova del Moral
Alex
Alexandre Boucaud
Alexandre Gramfort
Alexandre Sevin
Alex Henrie
Alihan Zihna
Aline Ribeiro de Almeida
Allen Akinkunle
András Simon
Andrea Esuli
Andreas Mueller
Andrew Lamb
Andriy
andy
Angela Ambroz
Aniruddha Dave
anupam
Arnaud Rachez
Artem Golubin
Arthur Imbert
Arthur Mensch
Arturo Amor
ArturoAmor
Ashwin Mathur
Atsushi Nukariya
Aurélien Bellet
avm19
baam
Badr MOUFAD
balu
Baran Buluttekin
barankarakus
Barmaley.exe
Bartosz Michałowski
Bartosz Telenczuk
Behzad Tabibian
Benjamin Pedigo
Bharat Raghunathan
brentfagan
Brian Rice
brigi
Brooke Osborn
Bruno
bthirion
bwignall
c56pony
carlo
Carlos H Brandt
Chiara Marmo
Christian Kastner
Christian Lorentzen
Christian LorentzenAdrin Jalali
Christian LorentzenChristian Lorentzen
Christian LorentzenJérémie du Boisberranger
Christian Ritter
Christian Veenhuis
CJ Carey
Clément Doumouro
combscCode
Danny Sullivan
Danny SullivanJérémie du Boisberranger
david-cortes
David Dale
David DaleStefanie Senger
David Staub
Deeksha Madan
dengemann
DerWeh
Dimitri Papadopoulos Orfanos
dsullivan7
Eddie Bergman
Eden Brekke
Ekaterina Borovikova
EliaSchiavon
Elvis DOHMATOB
EricEllwanger
Fabian Pedregosa
Fabian PedregosaAdrin Jalali
Fabian PedregosaChristian Lorentzen
Fabian PedregosaGuillaume Lemaitre
Fabian PedregosaJérémie du Boisberranger
Fabian PedregosaLucy Liu
Fabian PedregosaRalf Gommers
Fabian PedregosaTialo
Fabian PedregosaYao Xiao
fcostin
Felix Glushchenkov
fhaselbeck
Florian Wilhelm
flyingimmidev
Frans Larsson
Frederick Robinson
Gabriel Stefanini Vicente
Gabriel S Vicente
Gael varoquaux
Gael Varoquaux
GaelVaroquaux
gedeck
genvalen
Gilles Louppe
Gim Seng
giorgiop
Glòria Macià Muñoz
Gregory R. Lee
gregorystrubel
Guillaume Lemaitre
Haesun Park
Hanmin Qin
Harsh Mahajan
Harutaka Kawamura
He Chen
Hirofumi Suzuki
imaculate
Immanuel Bayer
Immanuel BayerAdrin Jalali
Ishank Gulati
J-A16
jakirkham
James Alan Preiss
James Bourbeau
Jaques Grobler
Jenny Vo
jeremiedbb
Jérémie du Boisberranger
jeromedockes
Jesse Lima
Jiawei Zhang
Jiten Sidhpura
JJmistry
jmontoyam
joaak
Joan Massich
Joel Nothman
Joey Ortiz
Johannes Schönberger
Johannes SchönbergerJérémie du Boisberranger
Johann Faouzi
John Hopfensperger
John Pangas
Jörg Döpfert
jotasi
JSchuerz
Juan Carlos Alfaro Jiménez
Juan Manuel Caicedo Carvajal
Juan Martin Loyola
judithabk6
Julien Jerphanion
Justin Vincent
Karan Desai
Kian Eliasi
Kilian Kluge
Konstantin Podshumok
ksemb
Kuai Yu
Kushan Sharma
Kyle Kastner
Lars Buitinck
lbfin
lesteve
lingyi1110
Loïc Estève
Loïc EstèveFabian Pedregosa
Luccas Quadros
Lucy Liu
luk-f-a
Mabel Villalba
maikia
Malte Londschien
Malte S. Kurz
mandjevant
Manimaran
Manoj-Kumar-S
Marco Edward Gorelli
Maren Westermann
Maria Telenczuk
Maria TelenczukAdrin Jalali
Marie Douriez
MarieS-WiMLDS
Marijn van Vliet
martin-hahn
martin-kokos
Martin Larralde
Mateusz Sokół
Mathieu Blondel
Mathieu BlondelGuillaume Lemaitre
Mathieu BlondelIsaacTrost
Mathis Batoul
mathurinm
Maxwell
MechCoder
MechCoderGuillaume Lemaitre
Meekail Zain
Mehgarg
mehmetcanakbay
mhg
Michael
Michael Eickenberg
Michael Flaks
Michael Higgins
michalkrawczyk
millawell
mjbommar
mo
Mohit Sharma
mthorrell
Naoya Kanai
Nass
Natasha Borders
Nelle Varoquaux
Nelson Liu
Nicolas Hug
Nicolas Pinto
Nicolas Trésegnie
Nihal Thukarama Rao
Nishu Choudhary
@nkish
nuffe
Nurseit Kamchyev
Nwanna-Joseph
Oleksandr Pavlyk
Olivier Grisel
Olivier Hervieu
Omar Salman
Omar SalmanAdrin Jalali
PAB
Pablo Ibieta-Jimenez
Patric Lacouth
Paulo Haddad
Paulo S. Costa
Petar Mlinarić
Peter Prettenhofer
Peter St. John
PierreAttard
Pierre Glaser
Pinky
poorna-kumar
puhuk
pwalchessen
qdeffense
Raghav R V
Raghav RV
Rahil Parikh
Ralf Gommers
Ramana Subramanyam
Ram Rachum
Reshama Shaikh
Richard Taylor
Rithvik Rao
Rob
Robert Layton
Rob Zinkov
Rocco Meli
Rohan Ramanath
Roman Feldbauer
Roman Yurchak
Roman YurchakAdrin Jalali
Roman YurchakJérémie du Boisberranger
Roman YurchakRoman YurchakAdrin Jalali
Ross Barnowski
Rüdiger Busche
Rushabh Vasani
sadak
Sam Ezebunandu
SangamSwadik
Sarat Addepalli
Saurabh Jain
seales
Sean Atukorala
Sean Benhur J
Sebastian Flores
Sebastin Santy
Sergey Feldman
Sergey Karayev
Sergul Aydore
Shail Shah
Shao Yang Hong
shivamgargsya
Shuangchi He
siftikha
Sina Tootoonian
Sonny Hu
Søren Fuglede Jørgensen
sperret6
spikebh
Srimukh Sripada
Stanislav (Stanley) Modrak
Stefanie Senger
Stefano Lattarini
Stephen Hoover
Subhodeep Moitra
Sultan Orazbayev
Sunitha Selvan
surgan12
Sven Stehle
swu
Taehoon Lee
Takeshi Oura
Thierry Guillemot
Thomas Fan
Thomas J Fan
Thomas J. Fan
Thomas J FanAdrin Jalali
Thomas J FanChristian Lorentzen
Thomas J FanGuillaume Lemaitre
Thomas J FanJérémie du Boisberranger
Thomas J FanMichael Higgins
Thomas J FanOmar Salman
Thomas Moreau
Tialo
Tim Head
TimotheeMathieu
Tim Staley
Tirth Patel
t-kusanagi2
tnwei
Tola A
Tom DLT
TomDLT
Tom Dupré la Tour
Toshihiro NAKAE
trevorstephens
unknown
Valentin Stolbunov
vene
Venkatachalam N
(Venkat) Raghav, Rajagopalan
Vighnesh Birodkar
Vinayak Mehta
Vishal
Vlad Niculae
Vlasovets
vstolbunov
Wenhua Yang
Weyb
William de Vazelhes
Xiao Yuan
xsat
Yao Xiao
Yar Khine Phyo
Yaroslav Halchenko
Yasmeen Alsaedy
Yen
YenChenLin
ymazari
Yosuke KOBAYASHI
Zito Relova
ZJ Poh
マーティン

@thomasjpfan
Copy link
Member

thomasjpfan commented Apr 24, 2024

If it is legal for us to remove authors like this, then I am in favor of removing and replacing it with: (As suggested in #20813 (comment))

# Authors: The scikit-learn developers
# License: BSD 3 clause

I want this consistency as well, but never pushed for it because of the potential negative sentiment and gray legal area of removing authors from the source.

I'll be more comfortable, if we prepared a public statement that explains that it is legal to remove the authors and replace it with "The scikit-learn developers".

@stefan6419846
Copy link

stefan6419846 commented Apr 24, 2024

License: BSD 3 clause

I would recommend to use BSD-3-Clause as the SPDX identifier in this case to match the corresponding ISO standard appendix of well-known licenses - or even

SPDX-License-Identifier: BSD-3-Clause

Reference:

@adrinjalali
Copy link
Member

adrinjalali commented Apr 29, 2024

I'm writing a blog post for this then.

EDIT: scikit-learn/blog#180

@vene
Copy link
Member

vene commented May 1, 2024

for what it's worth, if there is a legal question i support the change and give permission when it comes to my own name in any files.

@betatim
Copy link
Member

betatim commented May 7, 2024

In the article linked in #20813 (comment), the recommandation is to not remove the existing authors:

I read that recommendation in relation to taking code from another project and adding it to yours. In that case you shouldn't remove/modify the existing copyright notice. Which I think is quite different from what we are trying to do here.

@lorentzenchr
Copy link
Member Author

In #28799 we went with

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

@betatim
Copy link
Member

betatim commented May 13, 2024

Should we adopt this scheme everywhere then? Worth making a PR that updates lots of files in one go or do it as we touch files? Somehow one big PR feels like the thing to do

@lorentzenchr
Copy link
Member Author

This issue got to a decision and was closed by several PRs, thanks to all contributors, in particular thanks to @adrinjalali.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

No branches or pull requests

10 participants