Skip to content

MNT Add author/license note where missing and add the linter #29477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 22, 2024

Conversation

adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Jul 12, 2024

This adds a linter and adds the notice where missing.

@adrinjalali adrinjalali changed the title Add author/license note where missing and add the linter MNT Add author/license note where missing and add the linter Jul 12, 2024
Copy link

github-actions bot commented Jul 12, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 660d1a3. Link to the linter CI: here

@adrinjalali adrinjalali marked this pull request as ready for review July 16, 2024 09:29
@adrinjalali
Copy link
Member Author

@betatim @lorentzenchr , wanna have a look? This should close #20813 maybe?

@@ -5,6 +5,9 @@
the gradients and hessians of the training data.
"""

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

# Author: Nicolas Hug
Copy link
Member

@betatim betatim Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we keep lines like this? I don't remember if we discussed/decided something, my knee jerk reaction while looking at this PR is "we should remove it because it is confusing to have authors and author".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we'll remove them all at some point. It can be done in follow-up PRs though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR was supposed to remove these, this was missed.

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all these administrative fix ups.

One question about having "author" and "authors", but otherwise looks good to me!

Comment on lines +156 to +162
# E721 is in preview (july 2024) and gives many false positives.
# Use `is` and `is not` for type comparisons, or `isinstance()` for
# isinstance checks
"E721",
# F841 is in preview (july 2024), and we don't care much about it.
# Local variable ... is assigned to but never used
"F841",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring them, should we just fix them first ?
Looks like you've already started it in #29501. When it's merged, we can remove this filter, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and now, preview enables some stuff in existing enabled rules that we might not want to fix (astral-sh/ruff#12343)

# Use `is` and `is not` for type comparisons, or `isinstance()` for
# isinstance checks
"E721",
# F841 is in preview (july 2024), and we don't care much about it.
Copy link
Member

@jeremiedbb jeremiedbb Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we don't care much about it.

well, it shouldn't be too hard to fix and I think it hurts readability to have useless variables :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure. I'm happy to fix them in a separate PR, but I'm not sure if it helps readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the filters for now, and see what we do about them afterwards.

The thing is that these preview rules will become stable at some point and I don't think we should add filters for new rules to which we don't comply. I think that we should follow ruff/flake8/black rules as they come, unless it represents a major change that everyone dislike.

@@ -7,7 +7,7 @@ repos:
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.2.1
rev: v0.5.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bump the min version in _min_dependencies and update the lock files then ?

Otherwise I can imagine that a user has an older version and then the CI complains for errors that he doesn't see locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also updates _min_dependencies, should I try to recreate the lock files? I don't mind, but that happens anyway the monday after we merge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that. For the lock files, it should be fine because it's just the linter, but usually it's better to regenerate in the PR updating the dependencies so that we can see and fix any issue right away.

pyproject.toml Outdated
]

[tool.ruff.lint.flake8-copyright]
notice-rgx = "\\#\\ Authors:\\ The\\ scikit\\-learn\\ developers\\\n\\#\\ SPDX\\-License\\-Identifier:\\ BSD\\-3\\-Clause"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an extra blank line after because currently there are occurences of

# authors...
# copyright
import xxx

and

# authors...
# copyright...

import xxx

I prefer the latter

@adrinjalali
Copy link
Member Author

I think this is ready for a merge?

@betatim
Copy link
Member

betatim commented Jul 17, 2024

I update the lockfiles and pushed them to this PR. Let's see what happens.

I was expecting to see ruff in the list of things changed in the lockfiles, but I can't spot it?

@betatim
Copy link
Member

betatim commented Jul 17, 2024

The answer to why ruff doesn't appear in the lockfiles is that ruff gets installed in the workflow.

@jeremiedbb
Copy link
Member

The answer to why ruff doesn't appear in the lockfiles is that ruff gets installed in the workflow.

haha I haven't thought about that :) then definitely no need to regenerate them here.

Comment on lines 16 to 18
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum the extra blank line doesn't seem to work

@betatim
Copy link
Member

betatim commented Jul 18, 2024

I undid the lockfile change.

@adrinjalali
Copy link
Member Author

Did more cleanups, we haven't even touched the cython files, maybe for another PR?

@jeremiedbb
Copy link
Member

we haven't even touched the cython files, maybe for another PR?

sure :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the extra blank line thing is not working, see
#29477 (comment). I saw many occurrences. Otherwise LGTM.

@adrinjalali
Copy link
Member Author

adrinjalali commented Jul 22, 2024

@jeremiedbb I don't know how to fix the newline issue in linter, but I added all missing newlines with

$ git grep -n '# SPDX-License-Identifier: BSD-3-Clause' | awk -F: '
{
    filename=$1;
    linenum=$2;
    cmd="sed -n " linenum+1 "p " filename;
    cmd | getline nextline;
    close(cmd);
    if (nextline !~ /^$/) {
        insertCmd="sed -i \"" linenum "a\\\\\n\" " filename;
        system(insertCmd);
    }
}'

(code generated partly by chatgpt)

@jeremiedbb
Copy link
Member

I don't know how to fix the newline issue in linter

Alright, not a big issue though, thanks for trying :)
Let's merge then

@jeremiedbb jeremiedbb merged commit 3e127a3 into scikit-learn:main Jul 22, 2024
30 checks passed
@adrinjalali adrinjalali deleted the ruff/copyright branch July 23, 2024 07:20
@Charlie-XIAO
Copy link
Contributor

The linter seems to raise false positives CPY001 if the file uses CRLF. Since the recommended setup of Git is core.autocrlf true (so CRLF on Windows) I imagine most developers using Windows (though maybe none except me 🤣) to experience the issue... Not sure if it's a bug in ruff or there is some kind of setting that can be activated?

@adrinjalali
Copy link
Member Author

Would it then make sense to change that git config maybe? Could also change the regular expression to accept CRLF as well.

@Charlie-XIAO
Copy link
Contributor

If disabling core.autocrlf on Windows there will tons of diffs because of different EOL IIRC 😢 so a fix might be better. Perhaps changing \n to \r\n??

@jeremiedbb
Copy link
Member

would modifying the regex with \r?\n work ?

MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024
…learn#29477)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants