Skip to content

Linting Package Replacement: Pycodestyle to Ruff #24994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Luca-Blight opened this issue Oct 24, 2023 · 20 comments · Fixed by #27300
Closed

Linting Package Replacement: Pycodestyle to Ruff #24994

Luca-Blight opened this issue Oct 24, 2023 · 20 comments · Fixed by #27300

Comments

@Luca-Blight
Copy link

Proposed new feature or change:

There is a newer package that has been implemented by other major packages such as FastAPI and Llama index which has overwhelming performance gains. I would like to make the change to Numpy if that's okay?

More info:
https://docs.astral.sh/ruff/

@rgommers
Copy link
Member

Hi @Luca-Blight, thanks for proposing to work on this. If it's a simple swap of tools, then yes please go for it - ruff is certainly an upgrade. If it requires significant code changes throughout the code base (e.g. because ruff doesn't run on the same diff as right now), then it'd be better to avoid this until after the 2.0 release, given that we have a lot more in flight than during an average release cycle.

@Luca-Blight
Copy link
Author

Luca-Blight commented Oct 25, 2023

@rgommers got it I will keep that in mind. Thanks for your response.

@lysnikolaou
Copy link
Member

lysnikolaou commented Feb 23, 2024

I did a little experiment and ran pycodestyle without the diff option and the results are not too good.

7       E101 indentation contains mixed spaces and tabs
13      E111 indentation is not a multiple of 4
9       E114 indentation is not a multiple of 4 (comment)
9       E116 unexpected indentation (comment)
11      E117 over-indented (comment)
58      E124 closing bracket does not match visual indentation
3       E129 visually indented line with same indent as next logical line
5       E131 continuation line unaligned for hanging indent
334     E201 whitespace after '['
57      E202 whitespace before ']'
301     E203 whitespace before ':'
30      E211 whitespace before '('
200     E221 multiple spaces before operator
11      E222 multiple spaces after operator
187     E225 missing whitespace around operator
2       E227 missing whitespace around bitwise or shift operator
8       E228 missing whitespace around modulo operator
1163    E231 missing whitespace after ','
90      E261 at least two spaces before inline comment
13      E262 inline comment should start with '# '
10      E271 multiple spaces after keyword
13      E272 multiple spaces before keyword
98      E301 expected 1 blank line, found 0
83      E303 too many blank lines (2)
2       E304 blank lines found after function decorator
137     E305 expected 2 blank lines after class or function definition, found 1
52      E306 expected 1 blank line before a nested definition, found 0
6       E401 multiple imports on one line
2308    E501 line too long (90 > 79 characters)
53      E502 the backslash is redundant between brackets
73      E701 multiple statements on one line (colon)
6       E702 multiple statements on one line (semicolon)
1       E703 statement ends with a semicolon
21      E711 comparison to None should be 'if cond is None:'
14      E713 test for membership should be 'not in'
1       E714 test for object identity should be 'is not'
2       E722 do not use bare 'except'
1       E743 ambiguous function definition 'I'
6       W191 indentation contains tabs
3       W292 no newline at end of file
107     W293 blank line contains whitespace (I enabled this rule, it was ignored before)

It would be a good amount of effort to solve all of these, but I think we should do it. A good plan would be to first replace pycodestyle with ruff, which includes pycodestyle rules amongst others, fix all errors/warnings (ruff can probably autofix quite a few of these), and, finally, change the CI to lint the whole code base on every run, instead of just the diff. Ruff is fast, so this shouldn't be a problem.

@Luca-Blight Are you still planning to work on this?

@lucascolley
Copy link
Contributor

A good plan would be to first replace pycodestyle with ruff, which includes pycodestyle rules amongst others, fix all errors/warnings (ruff can probably autofix quite a few of these)

That's 2004 errors, with 789 "auto-fixable" (not sure how safe the F401 fixes would be). We ignore E741 in SciPy, so you can remove 82 from that. Doable? But certainly a bit of work.

> ruff check . --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --statistics

725	F401	[*] `.._utils._inspect.formatargspec` imported but unused
348	F405	[ ] `ALLOW_THREADS` may be undefined, or defined from star imports
219	F811	[*] Redefinition of unused `Sequence` from line 1
133	F821	[ ] Undefined name `Bool`
 92	E402	[ ] Module level import not at top of file
 82	E741	[ ] Ambiguous variable name: `I`
 75	F841	[*] Local variable `A` is assigned to but never used
 73	E701	[ ] Multiple statements on one line (colon)
 54	E712	[*] Comparison to `False` should be `cond is False` or `if not cond:`
 51	E731	[*] Do not assign a `lambda` expression, use a `def`
 46	E721	[ ] Do not compare types, use `isinstance()`
 27	F403	[ ] `from ._asarray import *` used; unable to detect undefined names
 21	E711	[*] Comparison to `None` should be `cond is None`
 17	E713	[*] Test for membership should be `not in`
 14	F541	[*] f-string without any placeholders
 12	E714	[*] Test for object identity should be `is not`
  6	E702	[ ] Multiple statements on one line (semicolon)
  5	E401	[*] Multiple imports on one line
  2	E743	[ ] Ambiguous function name: `I`
  1	E703	[*] Statement ends with an unnecessary semicolon
  1	E722	[ ] Do not use bare `except`

@DimitriPapadopoulos
Copy link
Contributor

Remember to ruff check --exclude numpy/distutils as well.

@DimitriPapadopoulos
Copy link
Contributor

$ ruff check . --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --exclude numpy/distutils --statistics
660	F401	[ ] unused-import
314	F405	[ ] undefined-local-with-import-star-usage
218	F811	[ ] redefined-while-unused
 72	F841	[*] unused-variable
 66	E402	[ ] module-import-not-at-top-of-file
 55	E741	[ ] ambiguous-variable-name
 51	E712	[*] true-false-comparison
 48	E731	[*] lambda-assignment
 33	E721	[ ] type-comparison
 26	F403	[ ] undefined-local-with-import-star
 15	F821	[ ] undefined-name
 11	E714	[*] not-is-test
  6	F822	[ ] undefined-export
  2	E743	[ ] ambiguous-function-name
$ 

@lucascolley
Copy link
Contributor

nice work @DimitriPapadopoulos 🙏

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Aug 29, 2024

Pyflakes (F) rules are enabled by default:

By default, Ruff enables Flake8's F rules, along with a subset of the E rules, omitting any stylistic rules that overlap with the use of a formatter, like ruff format or Black.

Would it be OK to initially silence F rules?

Right now, the pycodestyle errors (E) count in #27300 is down to:

$ ruff check . --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --exclude numpy/distutils --ignore F --statistics
66	E402	[ ] module-import-not-at-top-of-file
55	E741	[ ] ambiguous-variable-name
51	E712	[*] true-false-comparison
48	E731	[*] lambda-assignment
$ 

I could easily fix pycodestyle warnings (W) (see #27307):

$ ruff check . --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --exclude numpy/distutils --extend-select W --ignore F,E --statistics
147	W291	[*] trailing-whitespace
 67	W293	[*] blank-line-with-whitespace
  3	W292	[*] missing-newline-at-end-of-file
$ 

@lucascolley
Copy link
Contributor

lucascolley commented Aug 29, 2024

Would it be OK to initially silence F rules?

This sounds like a good idea to me! (disclaimer: not a NumPy maintainer)

@rgommers
Copy link
Member

Sounds fine indeed to silence the F rules for now, and fix the rest then enable ruff usage.

If it requires significant code changes throughout the code base (e.g. because ruff doesn't run on the same diff as right now), then it'd be better to avoid this until after the 2.0 release

Well, we're past that point now:) Seems like we're in a slightly more quiet period now, so there's nothing blocking doing this I think.

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Aug 29, 2024

The main issue is the capabilitiy of pycodestyle and ruffto report issues in diffs only, instead of whole files:

  • While pycodestyle --diff analyses entire files, it is able to report issues within the limits of a unified diff:
    pycodestyle/pycodestyle.py
    Lines 2488 to 2490 in c8e36a0
        parser.add_option('--diff', action='store_true',
                          help="report changes only within line number ranges in "
                               "the unified diff received on STDIN")
  • The --diff option of ruff is unrelated: it outputs a diff for each changed file:
    ruff/docs/configuration.md
    Lines 576 to 579 in b4d9d26
          --diff
              Avoid writing any fixed files back; instead, output a diff for each
              changed file to stdout, and exit 0 if there are no diffs. Implies
              `--fix-only`
    By itself, ruff is unable to report issues only within the limits of a unified diff (see Execute ruff only on staged changes astral-sh/ruff#4049).

I see two options:

  1. Apply pycodestyle rules to the whole codebase, before switching from pycodestyle to ruff. See MAINT: Start applying pycodestyle error rules (E) #27308 for a start.
  2. Use a 3rd party tool to limit ruff to the lines in a unified diff. Two options are discussed in Execute ruff only on staged changes astral-sh/ruff#4049:
    • pre-commit, which I am not fond of but is used in lots of projects.
    • darker, which I wasn't aware of.

Which option would you prefer?

@lucascolley
Copy link
Contributor

  1. Apply pycodestyle rules to the whole codebase, before switching from pycodestyle to ruff. See MAINT: Start applying pycodestyle error rules (E) #27308 for a start.

This is pretty much what we did in SciPy, although applying ruff to the whole codebase instead. It worked okay.

@DimitriPapadopoulos
Copy link
Contributor

An additional issue is that I am not certain that ruff check implements all of pycodestyle rules. Recent linters such as ruff tend to leave some pycodestyle rules to formatters (such as black or ruff format).

@lucascolley Do you have more insight on that matter? Have you checked that, for example by running pycodestyle and ruff check on the SciPy codebase and comparing the output?

By the way, I guess Numpy maintainers wouldn't want to reformat Numpy with black or ruff format, would they?

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Aug 29, 2024

run "pycodestyle" on the Numpy codebase
$ time pycodestyle --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --exclude numpy/distutils --ignore=W -qq --statistics .
1       E101 indentation contains mixed spaces and tabs
1       E111 indentation is not a multiple of 4
1       E114 indentation is not a multiple of 4 (comment)
1       E116 unexpected indentation (comment)
8       E117 over-indented
27      E121 continuation line under-indented for hanging indent
148     E122 continuation line missing indentation or outdented
225     E123 closing bracket does not match indentation of opening bracket's line
25      E124 closing bracket does not match visual indentation
4       E125 continuation line with same indent as next logical line
288     E126 continuation line over-indented for hanging indent
546     E127 continuation line over-indented for visual indent
1176    E128 continuation line under-indented for visual indent
1       E129 visually indented line with same indent as next logical line
6       E131 continuation line unaligned for hanging indent
320     E201 whitespace after '['
42      E202 whitespace before ')'
123     E203 whitespace before ':'
2       E211 whitespace before '('
136     E221 multiple spaces before operator
3       E222 multiple spaces after operator
22      E225 missing whitespace around operator
3248    E226 missing whitespace around arithmetic operator
2       E227 missing whitespace around bitwise or shift operator
7       E228 missing whitespace around modulo operator
1095    E231 missing whitespace after ','
1769    E241 multiple spaces after ':'
153     E251 unexpected spaces around keyword / parameter equals
39      E261 at least two spaces before inline comment
247     E265 block comment should start with '# '
24      E266 too many leading '#' for block comment
9       E271 multiple spaces after keyword
7       E272 multiple spaces before keyword
51      E275 missing whitespace after keyword
23      E301 expected 1 blank line, found 0
797     E302 expected 2 blank lines, found 1
69      E303 too many blank lines (2)
2       E304 blank lines found after function decorator
87      E305 expected 2 blank lines after class or function definition, found 1
42      E306 expected 1 blank line before a nested definition, found 0
84      E402 module level import not at top of file
1837    E501 line too long (117 > 79 characters)
1       E701 multiple statements on one line (colon)
10      E704 multiple statements on one line (def)
51      E712 comparison to True should be 'if cond is True:' or 'if cond:'
24      E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
48      E731 do not assign a lambda expression, use a def
88      E741 ambiguous variable name 'l'
1       E743 ambiguous function definition 'I'

real	0m18,454s
user	0m18,410s
sys	0m0,030s
$ 
run "ruff check" on the Numpy codebase
$ time ruff check --exclude vendored-meson --exclude numpy/typing/tests/data --exclude numpy/typing/_char_codes.py --exclude numpy/__config__.py --exclude numpy/f2py --exclude numpy/distutils --select E --statistics .
785	E501	[ ] line-too-long
 66	E402	[ ] module-import-not-at-top-of-file
 55	E741	[ ] ambiguous-variable-name
 51	E712	[*] true-false-comparison
 48	E731	[*] lambda-assignment
 33	E721	[ ] type-comparison
 11	E714	[*] not-is-test
  2	E743	[ ] ambiguous-function-name
  1	E101	[ ] mixed-spaces-and-tabs

real	0m0,107s
user	0m0,390s
sys	0m0,080s
$ 

As you can see, ruff is much faster. However, it does not implement rules related to formatting - at least not as E rules. Nowadays, formatting is better left to "formatters" (black or ruff format) rather than "linters" (flake8 or ruff check), but again I am not certain that Numpy maintainers would be willing to change the formatting style and/or reformat the codebase.

@lucascolley
Copy link
Contributor

An additional issue is that I am not certain that ruff check implements all of pycodestyle rules. Recent linters such as ruff tend to leave some pycodestyle rules to formatters (such as black or ruff format).
@lucascolley Do you have more insight on that matter? Have you checked that, for example by running pycodestyle and ruff check on the SciPy codebase and comparing the output?

Sorry, I can't really help there. We had already transitioned to ruff, but with lots of rules disabled. What I meant by "what we did" was that we used that process to enable ruff rules that had lots of failures.

I guess Numpy maintainers wouldn't want to reformat Numpy with black or ruff format, would they?

TLDR: no, not right now. Long answer: astral-sh/ruff#7310 (comment)

@lucascolley
Copy link
Contributor

lucascolley commented Aug 29, 2024

At least in SciPy, I don't think we were concerned about Ruff being less strict than Pycodestyle in places. I think we trusted that Ruff has caught all of the important rules (or at least thought that switching was a net gain overall).

@charris
Copy link
Member

charris commented Aug 30, 2024

There has been discussion of adopting an greater linelength for NumPy. Excluding distutils and tools:

length <= 79: 1783
length <= 85: 757
length <= 88: 579

A small increase in line length makes a significant difference. Black uses 88 by default, the longest line is 89.

@lucascolley
Copy link
Contributor

In SciPy, we adopted and enforce an 88 char limit. It seems to have worked pretty well without causing significant disruption.

@DimitriPapadopoulos
Copy link
Contributor

I wanted to suggest bumping to 88 chars as well. I'll prepare a PR...

@lucascolley
Copy link
Contributor

A note to NumPy maintainers: for larger style-only PRs, you may wish to utilise a file like https://github.com/scipy/scipy/blob/main/.git-blame-ignore-revs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants