[RFC] Building a CLI for typescript-eslint #6020

bradzacher · 2021-11-18T21:23:46Z

bradzacher
Nov 18, 2021
Maintainer

I want to explore is building a CLI for typescript-eslint.

A big issue we run into with our tooling is that we are "at the mercy" of ESLint's CLI. ESLint controls the process - how files are selected, when files are selected, in what order they are linted.

This is a problem because we can't do efficient memory management, nor can we properly prioritise work. We don't have any control or visibility into what files are being linted so we can't "dispose" of a program when it's done: we don't know if or when all the files that belong to the project will be linted (for example it's common that a user might lint a subset of a project). This forces us to keep every single project in memory for the duration of the lint run (which leads to OOMs on any non-trivial codebase).

Additionally due to the fact that ESLint has one API for all use cases we don't know which "state" ESLint is in. ESLint just exposes the ESLint API which is used in the "single-run" CLI usecase as well as the persistent IDE usecase.

This means that we're stuck building for the worst case scenario like:

using inefficient APIs when we could use better ones
keeping programs in memory forever just in case we need them again (OOM when many (>10) project configs passed to the parser #1192)

For (1) we've attempted to work around by attempting to detect the CLI usecase (#3512) - which definitely helps! But it has a few unfortunate shortcomings.
For (2) we could probably build on top of the (1) solution to free up programs automatically. But that kind of feels like hacks on hacks on hacks - and we're building a super complicated system which is brittle and pushes us further into a world where we can't get new contributors due to the complexity of our system.

It's worth noting that even for the CLI usecase (which we tend to think of as "one and done") - ESLint can lint a file up to 11 times when it is calculating fixes for a file (1 initial pass and 10 fix passes). Which does cause some unique problems with optimisations we might try to apply.

So I want to explore a new direction - a new CLI which allows us to "work around" these issues from the top down instead of the middle.

My proposal would be a simple CLI along the lines of this:

$ tseslint --help
Usage: tseslint [options] file.js [file.js] [dir]

Basic configuration:
  -c, --config path::String       Use this configuration, overriding .eslintrc.* config options if present

Fixing problems:
  --fix                           Automatically fix problems
  --fix-type Array                Specify the types of fixes to apply (problem, suggestion, layout)

Ignoring files:
  --ignore-path path::String      Specify path of ignore file
  --ignore-pattern [String]       Pattern of files to ignore (in addition to those in .eslintignore)

Handling warnings:
  --quiet                         Report errors only - default: false
  --max-warnings Int              Number of warnings to trigger nonzero exit code - default: -1

Output:
  -o, --output-file path::String  Specify file to write report to
  -f, --format String             Use a specific output format - default: stylish
  --color, --no-color             Force enabling/disabling of color

Inline configuration comments:
  --no-inline-config              Prevent comments from changing config or rules
  --report-unused-disable-directives  Adds reported errors for unused eslint-disable directives

Miscellaneous:
  --env-info                      Output execution environment information - default: false
  --no-error-on-unmatched-pattern  Prevent errors when pattern is unmatched
  --debug                         Output debugging information
  -h, --help                      Show help
  -v, --version                   Output the version number

A keen eye would notice this proposed CLI is just a subset of the ESLint CLI. This is completely intentional.

At a high level - this would be the proposed workflow for the CLI:

Use ESLint's FileEnumerator to expand the passed paths/blobs to the list of actual files to lint
For each file resolve the config and bucket them into two buckets: files with type information and files without type information.
For files without type information - we can just run ESLint over them as normal.
For files with type information - we want to group them again. This time we want to group them by which TSConfig they belong to.
Once we have the tsconfig buckets - we can then execute each project in turn:
1. manually create the TS Program instance for the tsconfig.
2. lint all files in the bucket.
3. if there are fixes and n < 10 then apply fixes, n + 1, go to (5.iii).
4. dispose of the TS Program instance (solving Purge programs once they are done #1718)

By controlling the order and grouping of files - we can know exactly when we can free up the memory - allowing us to prevent OOMs due to having all programs in memory at once.

This also unlocks other possible optimisations for us like parallelisation!
We know that each project is "isolated" from one another - so we can run each project (eg step 5) in a separate thread. We could also bucket the non-type information files (step 3) in their own separate thread. This should mean that instead of having O(nm) performance where n is the number of projects and m is the worst-case lint time for a project, instead we would have O(m) performance.

If we wanted to - this would also let us rip out the "CLI detection" from the underlying CLI and thus simplify it again to reduce complexities. We could also choose to keep this so we can maintain some level of improved performance for people who continue to use the standard eslint binary.

cc @typescript-eslint/core-team

JamesHenry · 2021-11-18T21:55:42Z

JamesHenry
Nov 18, 2021
Maintainer

Great write up @bradzacher!

I agree that we have matured to the point where this is the main area where we can add real remaining value, as long as a clear design goal is that users can still use the existing ESLint tooling as they always have done (probably initially through a lot of duplication in the implementation, and then probably some kind of gnarly compat logic) then I am definitely all for this!

0 replies

JoshuaKGoldberg · 2021-11-21T16:50:17Z

JoshuaKGoldberg
Nov 21, 2021
Maintainer

Are there folks from the ESLint side who should weigh in here? I'm not 100% familiar with all the issues as described; although the contents of the RFC make sense as described it's a pretty big jump for users to make an entirely CLI for them to use.

An alternate approach would be to to work with the ESLint CLI maintainers and propose some kind of plugin system for the CLI.

0 replies

JamesHenry · 2021-11-21T16:54:14Z

JamesHenry
Nov 21, 2021
Maintainer

@JoshuaKGoldberg that was the reason for my point about around not breaking any existing support. The CLI would be entirely opt in and be most relevant for folks having issues with the current generic approach. I would imagine most users would never even know/care that this additional tool would exist.

I don’t want to speak for @bradzacher but give his thumbs up on my comment I imagine his own thinking is not a million miles away from that

0 replies

JamesHenry · 2021-11-21T16:55:47Z

JamesHenry
Nov 21, 2021
Maintainer

Another reason we would not want any breakage in support is the supplementary tooling such as IDE plugins that we would not look to replace, the highly optimised and specific CLI would be purely additive to the current ecosystem

0 replies

bradzacher · 2021-11-21T20:00:51Z

bradzacher
Nov 21, 2021
Maintainer Author

Yeah definitely we would have no choice but to maintain compat with the existing ESLint ecosystem because the IDE ecosystem relies on the standard method of operation.

Essentially right now our advice for larger projects right now is to use lerna/nx/etc to run the ESLint process isolated on each of the projects when doing an entire repo lint.

This OFC does require extra effort to ensure the configs are fully separated so they don't influence one another.

With a custom CLI we could essentially automate it for people and give them a best-practice by default.

The IDE usecase shouldn't matter for most of the OOMs because you usually work on a subset of files and thus a subset of projects.

We can probably chat to the ESLint folks! This would definitely require a detailed RFC on that side before it could be worked into the CLI. We are like 60% of the ESLint userbase so we have some weight behind us.
The only issue is that an RFC would take a non-trivial amount of work to draft and collab on plus integration into the ESLint codebase - which would limit the impact to only codebases on ESLint 8+.

There are pros and cons to both approaches.

0 replies

G-Rath · 2021-12-26T00:38:19Z

G-Rath
Dec 26, 2021

I imagine having a custom CLI would also provide a possible path to supporting .eslintrc.ts, which'd be cool to be able to explore further.

While it'd not be directly compatible with the ecosystem, I think most tools and IDEs should fine as they should primarily be caring about how to "talk" to eslint - aka the CLI.

Am excited to see where this goes 🚀

0 replies

bradzacher · 2021-12-29T00:10:45Z

bradzacher
Dec 29, 2021
Maintainer Author

Initial proof of concept for the CLI: #4359

On our repo it looks to be consistently ~15-20s faster than a pure ESLint run.

0 replies

cyrfer · 2022-10-05T17:18:38Z

cyrfer
Oct 5, 2022

Essentially right now our advice for larger projects right now is to use lerna/nx/etc to run the ESLint process isolated on each of the projects when doing an entire repo lint.

@bradzacher Is this documented anywhere? I want a reference to make sure my configs are right.

0 replies

bradzacher · 2022-10-13T07:24:37Z

bradzacher
Oct 13, 2022
Maintainer Author

@cyrfer
https://typescript-eslint.io/docs/linting/typed-linting/monorepos

we don't explicitly talk about running things in isolated threads, but it's the logical next step if you're running into OOMs after separating your tsconfigs.

The root cause of OOMs is that we have to create TS data structures without cleaning them up.

More files = more memory.
- This should be obvious - each .ts file has a corresponding ts.SourceFile and TS generates a set of type objects for all the types in the file.
More interdependencies between your packages = more memory.
- This is less obvious. If package A depends on package B, then A needs the types from B. This means that TS will read in all of the .d.ts files for package B. If package C also needs package B, then that's another copy of these .d.ts files (because TS doesn't have mechanisms to share memory between programs).
- This combines exponentially with "more files" because more files means more .d.ts files which are duplicated n times (once for each dependent package).
- This could potentially be solved by project references but TS doesn't have good infra to do this for us.

So how can you solve the problem?
One way is to use one tsconfig to work around the interdependency problem. If you have one tsconfig which includes every single file in your workspace then there's no need to load .d.ts files or duplicate them! But this means your entire project

needs to be able to be represented in one config (which doesn't work if different packages have different libs for example)
needs to fit into memory (eg ts.SourceFiles + types all need to fit into <2GB of RAM)

Another way is to just ignore the problem entirely! By using a separate nodejs process per package, you no longer need to worry about any of the above problems because you get 2gb of ram per package to play with! Which is the same limitation placed on a tsc run (or your webpack run, or whatever).

Which is why I proposed this CLI - we can do this for you automatically and save everyone trying to figure out how to make it work.
It also gives us a nice entrypoint to do things like create a bug reporting tool, a templating tool, or a troubleshooting tool, etc.

0 replies

bradzacher · 2022-10-13T07:25:50Z

bradzacher
Oct 13, 2022
Maintainer Author

also cc @JoshuaKGoldberg based on our discussion the other day.
This is my "disconnected" solution for the whole "eslint doesn't give us much information" problem. Instead of changing ESLint, we inject ourselves in front of ESLint so that we can get that information and act on it properly, solving many of our performance problems!

0 replies

Uh oh!

[RFC] Building a CLI for typescript-eslint #6020

Uh oh!

Uh oh!

bradzacher Nov 18, 2021 Maintainer

Replies: 10 comments

Uh oh!

JamesHenry Nov 18, 2021 Maintainer

Uh oh!

JoshuaKGoldberg Nov 21, 2021 Maintainer

Uh oh!

Uh oh!

JamesHenry Nov 21, 2021 Maintainer

Uh oh!

JamesHenry Nov 21, 2021 Maintainer

Uh oh!

Uh oh!

bradzacher Nov 21, 2021 Maintainer Author

Uh oh!

G-Rath Dec 26, 2021

Uh oh!

bradzacher Dec 29, 2021 Maintainer Author

Uh oh!

cyrfer Oct 5, 2022

Uh oh!

bradzacher Oct 13, 2022 Maintainer Author

Uh oh!

bradzacher Oct 13, 2022 Maintainer Author

bradzacher
Nov 18, 2021
Maintainer

JamesHenry
Nov 18, 2021
Maintainer

JoshuaKGoldberg
Nov 21, 2021
Maintainer

JamesHenry
Nov 21, 2021
Maintainer

JamesHenry
Nov 21, 2021
Maintainer

bradzacher
Nov 21, 2021
Maintainer Author

G-Rath
Dec 26, 2021

bradzacher
Dec 29, 2021
Maintainer Author

cyrfer
Oct 5, 2022

bradzacher
Oct 13, 2022
Maintainer Author

bradzacher
Oct 13, 2022
Maintainer Author