-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Add guidelines for automated contributions #29287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArturoAmorQ
doc/developers/contributing.rst
Outdated
contribution, including code generated by language models (LLMs), must conform | ||
to the project's existing style guide and coding conventions, in particular: | ||
|
||
- Keep all lines under 88 characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. it's linter's job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I had in mind is to avoid contributions that would just result in a red CI. In the case of #29268 (review), I would have liked to refer the OP to an official statement in the contributor's guide regarding raw copy-paste from an LLM.
That being said, it's true that the bot already comments on such CI fails, but I have the feeling that raw-LLM contributions are going to become more and more of a problem. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, at least for now we haven't had many. And when they get them, we simply block them in most cases anyway, so we don't care about the linter issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no reason to believe that LLMs wouldn't be better at following format standard than the average human.
The real problem with LLM code contributions, is not necessarily their form or ability to make tests pass but the fact that no human contributor fully understands the goal of the contribution, its inner value for end-users and more generally if this a change of the code base that is useful for our community or not.
- Avoid modifying unrelated lines. | ||
- Do not duplicate imports. | ||
- Provide comments that add value and avoid stating the obvious. | ||
- **Most importantly**: Do not contribute code that you don't understand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only point I'd like to keep from this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor improvements but otherwise LGTM.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArturoAmorQ. Here are some suggestions for being less wordy.
I would also encourage to add the policies about automated contributions in a less prominent place. It certainly should not be the first thing we give interested contributors to read after the "Our community, our values" block. Actually, it only needs to be somewhere, so the five people that are interested in that per year can find it through search (and we can refer to it if we need to).
The second part about coding conventions has some thematic similarities with the Coding guidelines which I think can be pulled off from "Developing scikit-learn estimators", because it is too general for this section anyways and could be unified with the new passage here.
Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
I see your point that it's quite a mood change w.r.t. the previous paragraph, but I still want it to be visible enough as in the future it may not be only 5 contributions per year. Maybe right after "Ways to contribute"? |
Yes, I like that idea. :) |
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
What does this implement/fix? Explain your changes.
As presented in the dev meeting notes, it was decided to add a statement to the contribution guide about automated issues/PRs.
Any other comments?
I was not present during the dev meeting so my wording may not be completely inline with what was discussed there. Feel free to improve it.