Skip to content

Commit 608d5e2

Browse files
doc: add old blogpost
1 parent df97b54 commit 608d5e2

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed

website/blog/optimize-ast-grep.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
author:
3+
- name: Herrington Darkholme
4+
date: 2023-01-23
5+
head:
6+
- - meta
7+
- property: og:type
8+
content: website
9+
- - meta
10+
- property: og:title
11+
content: Optimize ast-grep to get 10X faster
12+
- - meta
13+
- property: og:url
14+
content: https://ast-grep.github.io/blog/optimize-ast-grep.html
15+
- - meta
16+
- property: og:description
17+
content: How to optimize the Rust CLI tool ast-grep to become 10 times faster.
18+
---
19+
20+
# Optimize ast-grep to get 10X faster
21+
22+
In this post I will discuss how to optimize the Rust CLI tool [ast-grep](https://ast-grep.github.io/) to become 10 times faster.
23+
24+
Rust itself usually runs fast enough, but it is not a silver bullet to all performance issues.
25+
26+
In this case, I did not pay enough attention to runtime details or opted for naive implementation for a quick prototype. And these inadvertent mistakes and deliberate slacking off became ast-grep's bottleneck.
27+
28+
# Context
29+
30+
[ast-grep](https://ast-grep.github.io/) is my hobby project to help you search and rewrite code using [abstract syntax tree](https://www.wikiwand.com/en/Abstract_syntax_tree).
31+
32+
Conceptually, ast-grep takes a piece of pattern code (think it like a regular expression but for AST), matches the pattern against your codebase and gives a list of matched AST nodes back to you. See the [playground](https://ast-grep.github.io/playground) for a live demo.
33+
34+
I designed ast-grpe's architecture with performance in mind. Here are a few performance related highlights:
35+
36+
* it is written in Rust, a native language compiled to machine code.
37+
* it uses the venerable C library [tree-sitter](https://tree-sitter.github.io/) to parse code, which is the same library powering [GitHub's codesearch](https://github.com/features/code-search).
38+
* its command line interface is built upon [ignore](https://docs.rs/ignore/latest/ignore/), the same crates used by the blazing fast [ripgrep](https://github.com/BurntSushi/ripgrep).
39+
40+
Okay, enough self-promotion _BS_. If it is designed to be fast, how comes this blog? Let's dive into the performance bottleneck I found in my bad code.
41+
42+
43+
> Spoiler. It's my bad to write slow Rust.
44+
45+
# Profiling
46+
47+
The first thing to optimize a program is to profile it. I am lazy this time and just uses the [flamegraph](https://github.com/flamegraph-rs/flamegraph) tool.
48+
49+
Installing it is simple.
50+
51+
```bash
52+
cargo install flamegraph
53+
```
54+
55+
Then run it against ast-grep! No other setup is needed, compared to other profiling tools!
56+
57+
This time I'm using an ast-grep port of [es-lint](https://github.com/ast-grep/eslint) against [TypeScript](https://github.com/microsoft/TypeScript/)'s `src` folder.
58+
59+
This is the profiling command I used.
60+
61+
```bash
62+
sudo flamegraph -- ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
63+
```
64+
65+
The flamegraph looks like this.
66+
67+
<img width="1155" alt="Before Optimzation" src="https://user-images.githubusercontent.com/2883231/215253646-21b5f1dd-a810-4ddb-9bbe-4938b4cc15f9.png">
68+
69+
Optimizing the program is a matter of finding the hotspots in the flamegraph and fix them.
70+
71+
For a more intuitive feeling about performance, I used the old command `time` to measure the wall time to run the command. The result is not good.
72+
73+
```bash
74+
time ast-grep scan -c eslint/sgconfig.yml TypeScript/src
75+
17.63s user, 0.46s system, 167% cpu, 10.823 total
76+
```
77+
78+
The time before `user` is the actual CPU time spent on my program. The time before `total` represents the wall time. The ratio between them is the CPU utilization. In this case, it is 167%. It means my program is not fully utilizing the CPU.
79+
80+
It only runs six rules against the codebase and it costs about 10 whole seconds!
81+
82+
83+
In contrast, running one ast-grep pattern agasint the TypeScript source only costs 0.5 second and the CPU utilization is decent.
84+
85+
```bash
86+
time ast-grep run -p '$A && $A()' TypeScript/src --json > /dev/null
87+
88+
1.96s user, 0.11s system, 329% cpu, 0.628 total
89+
```
90+
91+
# Expensive Regex Cloning
92+
93+
The first thing I noticed is that the `regex::Regex` type is cloned a lot. I do know it is expensive to compile a regex, but I did not expect cloning one will be the bottleneck.
94+
Much to my limited understanding, `drop`ping Regex is also expensive!
95+
96+
Fortunately the fix is simple: I can use a reference to the regex instead of cloning it.
97+
98+
This optimzation alone shaves about 50% of execution time.
99+
100+
```bash
101+
time ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
102+
13.89s user, 0.74s system, 274% cpu 5.320 total
103+
```
104+
105+
The new flamegraph looks like this.
106+
107+
<img width="1509" alt="Avoid Regex Cloning" src="https://user-images.githubusercontent.com/2883231/215318711-634a8b99-3e02-4187-9073-ea5be25d098f.png">
108+
109+
110+
# Matching Rule can be Avoided
111+
112+
The second thing I noticed is that the `match_node` function is called a lot. It is the function that matches a pattern against an AST node.
113+
ast-grep can match an AST node by rules, and those rules can be composed together into more complex rules.
114+
For example, the rule `any: [rule1, rule2]` is a composite rule that consists of two sub-rules and the composite rule matches a node when either one of the sub-rules matches the node.
115+
This can be expensive since multiple rules must be tried for every node to see if they actually make a match.
116+
117+
I have already forsee it so every rule in ast-grep has an optimzation called `potential_kinds`. AST node in tree-sitter has its own type encoded in a unsigned number called `kind`.
118+
If a rule can only match nodes with specific kinds, then we can avoid calling `match_node` for nodes if its kind is not in the `potential_kinds` set.
119+
I used a BitSet to encode the set of potential kinds. Naturally the `potential_kinds` of composite rules can be constructed by merging the `potential_kinds` of its sub-rules, according to their logic nature.
120+
For example, `any`'s potential_kinds is the union of its sub-rules' potential_kinds, and `all`'s potential_kinds is the intersection of its sub-rules' potential_kinds.
121+
122+
Using this optimization, I can avoid calling `match_node` for nodes that can never match a rule. This optimization shaves another 40% of execution time!
123+
124+
```bash
125+
ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
126+
11.57s user, 0.48s system, 330% cpu, 3.644 total
127+
```
128+
129+
The new flamegraph.
130+
131+
<img width="1503" alt="potential_kinds trick" src="https://user-images.githubusercontent.com/2883231/215318794-7e3cf452-5016-4541-9c9d-1e266c1ee324.png">
132+
133+
# Duplicate Tree Traversal
134+
135+
Finally, the function call `ts_tree_cursor_child_iterator_next` caught my eyes. It meant that a lot of time was spent on traversing the AST tree.
136+
137+
Well, I dumbly iterating through all the six rules and matching the whole AST tree for each rule. This is a lot of duplicated work!
138+
139+
So I used a data structure to combine these rules, according to their `potential_kinds`. When I'm traversing the AST tree, I will first retrieve the rules with potential_kinds containing the kind of the current node. Then I will only run these rules against the node. And nodes without any `potential_kinds` hit will be naturally skipped during the traversal.
140+
141+
This is a huge optimization! The ending result is less than 1 second! And the CPU utilization is pretty good.
142+
143+
```bash
144+
ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null
145+
2.82s user, 0.12s system, 301% cpu, 0.975 total
146+
```
147+
148+
# Conclusion
149+
150+
The final flamegraph looks like this. I'm too lazy to optimize more. I'm happy with the sub-second result for now.
151+
152+
<img width="1513" alt="Merging rules" src="https://user-images.githubusercontent.com/2883231/215318867-b670b5fe-c678-4c31-985f-36e4f620baeb.png">
153+
154+
Optimizing ast-grep is a fun journey. I learned a lot about Rust and performance tuning. I hope you enjoyed this post as well.

0 commit comments

Comments
 (0)