Skip to content

MNT Make ruff check line-too-long (E501) #31214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Apr 16, 2025

It seems like checking too-long-lines was lost in #31015.

For now I added only E501 for checking too-long-lines but maybe we want to add more.

@DimitriPapadopoulos can you explain the reasoning behind your changes in pyproject.toml. I was not able to find any discussion about this in your PR.

cc @jeremiedbb.

Copy link

github-actions bot commented Apr 16, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 501adad. Link to the linter CI: here

@jeremiedbb
Copy link
Member

I don't get that "E" and "F" rules were dropped but there still are "E***" and "F***" exceptions.
I feel like we should enable back all of them

@lesteve
Copy link
Member Author

lesteve commented Apr 16, 2025

#31015 changed from select to extend-select as well, according to the ruff doc select default is E4, E7, E9, and F:

[tool.ruff.lint]
# On top of the default `select` (`E4`, E7`, `E9`, and `F`), enable flake8-bugbear (`B`) and flake8-quotes (`Q`).
extend-select = ["B", "Q"]

Not sure how often ruff change its defaults and whether we want to use extend-select (trust ruff defaults and extend them) or select (have full controls for our rules).

@DimitriPapadopoulos
Copy link
Contributor

The F rules are part of the default ruff rules, no need to explicitly select them when using extend-select:

A list of rule codes or prefixes to enable, in addition to those specified by select.

From select:

Default value: ["E4", "E7", "E9", "F"]

@DimitriPapadopoulos
Copy link
Contributor

Additionally, Conflicting lint rules should be avoided when using Ruff as a formatter. Specifically about E501:

While the line-too-long (E501) rule can be used alongside the formatter, the formatter only makes a best-effort attempt to wrap lines at the configured line-length. As such, formatted code may exceed the line length, leading to line-too-long (E501) errors.

@DimitriPapadopoulos
Copy link
Contributor

DimitriPapadopoulos commented Apr 16, 2025

I recommend using extend-select because it automatically avoids E Conflicting lint rules. The default list of rules ["E4", "E7", "E9", "F"] has been pretty stable for a very long time - by Ruff standards at least 😄

@jeremiedbb
Copy link
Member

jeremiedbb commented Apr 16, 2025

But the formatter doesn't split long strings. In some cases like the ones identified in this PR it's acceptable but if I write something like

warnings.warn(
    "a very looo ... ooong string"
)

it's pretty bad to keep as is imo. To me it's a regression compared to our previous state when we didn't have to care about that, while now I have to split them manually, and be extra careful when reviewing PRs that they don't introduce such behavior.

@DimitriPapadopoulos
Copy link
Contributor

Just don't split strings. See for example what the Linux kernel coding style has to say about user-visible strings:

Coding style is all about readability and maintainability using commonly available tools.

The limit on the length of lines is 80 columns and this is a strongly preferred limit.

Statements longer than 80 columns will be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information. Descendants are always substantially shorter than the parent and are placed substantially to the right. The same applies to function headers with a long argument list. However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them.

I can revert ruff format if you dislike it that much. I'll also look into settings that might help.

@jeremiedbb
Copy link
Member

This is C code and the motivation is to be able to grep. I think that breaking strings in Pyhon like this

"this is the beginning "
"and this is the end"

means implicit concatenation and the intrepreter treats it a 1 line string.

On the other hand, it also says

Coding style is all about readability

and having a string that spans 3 screens is really not readable at all.

@jeremiedbb
Copy link
Member

I can revert ruff format if you dislike it that much. I'll also look into settings that might help.

I have no issue with ruff formatting. I'd just like to have a way to enforce the 88 chars limits, even on strings.

@DimitriPapadopoulos
Copy link
Contributor

It's the formatter's job to break lines. Unfortunately, unlike black, ruff currently doesn't process strings the way you would like. This is tracked in issue astral-sh/ruff#6936.

Until the above issue is addressed, you could indeed re-enable E501. The risk is that formatting could result in unsolvable E501 issues — although I doubt it happens in practice.

@jeremiedbb
Copy link
Member

I'm in favor of enabling E501

@lesteve
Copy link
Member Author

lesteve commented Apr 16, 2025

To try to clarify the situation, there are a few different things:

  1. linting: if this PR is merged the ruff check will complain if a line containing a long string and the line goes over 88 characters. IMO this was an oversight of MNT black → ruff format #31015 and this PR fixes it.
  2. autoformatting: our pre-commit black hook has not formatted automatically lines with long strings since slightly more than a year and MNT upgrade black version #28802 was merged to upgrade the black version. black --preview does not format long strings automatically since 24.1.0. You need to explicitly do black --preview --enable-unstable-feature string_processing 1. Maybe some people had custom workflows to do it in their editor, but if that's the case I think this was overriding our pyproject.toml.
  1. it would be nice to find more info about possible edge cases where ruff format autoformats in a way that make ruff check unhappy because the line is too long i.e. MNT Make ruff check line-too-long (E501) #31214 (comment). I don't think I needed to correct any such issues in this PR, where the full scikit-learn code was autoformatted, so my guess is that it will happen very rarely. If that ever happens we can always use # noqa: E5012.

Footnotes

  1. It's somewhat hidden in the black 24.1 changelog. Confusingly you find old issues that claim that black --preview formats long strings when it doesn't anymore.

  2. If that starts happening way too much, which I doubt, there is a way to have a different line length setting for the formatter and the linter, see ruff doc

@lesteve lesteve changed the title MNT Make ruff check too-long-lines MNT Make ruff check line-too-long (E501) Apr 17, 2025
pyproject.toml Outdated
@@ -137,7 +137,7 @@ preview = true
# This enables us to use the explicit preview rules that we want only
explicit-preview-rules = true
# all rules can be found here: https://docs.astral.sh/ruff/rules/
extend-select = ["W", "I", "CPY001", "RUF"]
extend-select = ["E", "W", "I", "CPY001", "RUF"]
Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to explicitly enable E501, not all E rules:

  • E1 rules are about indentation. They are redundant when using a formatter.
  • E2 rules are about missing or extra spaces. They are redundant when using a formatter.
  • E3 rules are about missing or extra lines. They are redundant when using a formatter.
  • E4 rules are enabled by default.
  • E501 can be explicitly enabled as discussed.
  • E502 is about extra backslahes. It is redundant when using a formatter.
  • E7 rules are enabled by default.
  • E9 rules are enabled by default.

I recommend letting ruff do the right thing by default (["E4", "E7", "E9"]) and enabling only E501 as an additional rule.

Copy link
Member Author

@lesteve lesteve Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't sure about it but let's keep it more specific indeed for now.

One question: should we removed "E111", "E114", "E117" from our ignore list since there are not enabled in ruff by default? Or should we keep it just in case one day we add E again (or something more generic than E501) to avoid having to spend time in the ruff doc to figure this kind of tiny details all over again.

Ruff default that extend-select extends is ["E4", "E7", "E9", "F"].

I am slightly leaning towards keeeping the full ignore list in sync with the ruff doc even if it means ignoring things that are not selected.

Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have copied over all of the Conflicting lint rules, whether their respective ruleset is currently selected or not - just in case a ruleset is selected in the future. That's perhaps overkill, feel free to remove from the ignore list the rules from unselected rulesets.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jeremiedbb jeremiedbb merged commit 5fee5ad into scikit-learn:main Apr 17, 2025
36 checks passed
lucyleeow pushed a commit to EmilyXinyi/scikit-learn that referenced this pull request Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants