|
| 1 | +--- |
| 2 | +author: |
| 3 | + - name: Herrington Darkholme |
| 4 | +date: 2023-01-23 |
| 5 | +head: |
| 6 | + - - meta |
| 7 | + - property: og:type |
| 8 | + content: website |
| 9 | + - - meta |
| 10 | + - property: og:title |
| 11 | + content: Optimize ast-grep to get 10X faster |
| 12 | + - - meta |
| 13 | + - property: og:url |
| 14 | + content: https://ast-grep.github.io/blog/optimize-ast-grep.html |
| 15 | + - - meta |
| 16 | + - property: og:description |
| 17 | + content: How to optimize the Rust CLI tool ast-grep to become 10 times faster. |
| 18 | +--- |
| 19 | + |
| 20 | +# Optimize ast-grep to get 10X faster |
| 21 | + |
| 22 | +In this post I will discuss how to optimize the Rust CLI tool [ast-grep](https://ast-grep.github.io/) to become 10 times faster. |
| 23 | + |
| 24 | +Rust itself usually runs fast enough, but it is not a silver bullet to all performance issues. |
| 25 | + |
| 26 | +In this case, I did not pay enough attention to runtime details or opted for naive implementation for a quick prototype. And these inadvertent mistakes and deliberate slacking off became ast-grep's bottleneck. |
| 27 | + |
| 28 | +# Context |
| 29 | + |
| 30 | +[ast-grep](https://ast-grep.github.io/) is my hobby project to help you search and rewrite code using [abstract syntax tree](https://www.wikiwand.com/en/Abstract_syntax_tree). |
| 31 | + |
| 32 | +Conceptually, ast-grep takes a piece of pattern code (think it like a regular expression but for AST), matches the pattern against your codebase and gives a list of matched AST nodes back to you. See the [playground](https://ast-grep.github.io/playground) for a live demo. |
| 33 | + |
| 34 | +I designed ast-grpe's architecture with performance in mind. Here are a few performance related highlights: |
| 35 | + |
| 36 | +* it is written in Rust, a native language compiled to machine code. |
| 37 | +* it uses the venerable C library [tree-sitter](https://tree-sitter.github.io/) to parse code, which is the same library powering [GitHub's codesearch](https://github.com/features/code-search). |
| 38 | +* its command line interface is built upon [ignore](https://docs.rs/ignore/latest/ignore/), the same crates used by the blazing fast [ripgrep](https://github.com/BurntSushi/ripgrep). |
| 39 | + |
| 40 | +Okay, enough self-promotion _BS_. If it is designed to be fast, how comes this blog? Let's dive into the performance bottleneck I found in my bad code. |
| 41 | + |
| 42 | + |
| 43 | +> Spoiler. It's my bad to write slow Rust. |
| 44 | +
|
| 45 | +# Profiling |
| 46 | + |
| 47 | +The first thing to optimize a program is to profile it. I am lazy this time and just uses the [flamegraph](https://github.com/flamegraph-rs/flamegraph) tool. |
| 48 | + |
| 49 | +Installing it is simple. |
| 50 | + |
| 51 | +```bash |
| 52 | +cargo install flamegraph |
| 53 | +``` |
| 54 | + |
| 55 | +Then run it against ast-grep! No other setup is needed, compared to other profiling tools! |
| 56 | + |
| 57 | +This time I'm using an ast-grep port of [es-lint](https://github.com/ast-grep/eslint) against [TypeScript](https://github.com/microsoft/TypeScript/)'s `src` folder. |
| 58 | + |
| 59 | +This is the profiling command I used. |
| 60 | + |
| 61 | +```bash |
| 62 | +sudo flamegraph -- ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null |
| 63 | +``` |
| 64 | + |
| 65 | +The flamegraph looks like this. |
| 66 | + |
| 67 | +<img width="1155" alt="Before Optimzation" src="https://user-images.githubusercontent.com/2883231/215253646-21b5f1dd-a810-4ddb-9bbe-4938b4cc15f9.png"> |
| 68 | + |
| 69 | +Optimizing the program is a matter of finding the hotspots in the flamegraph and fix them. |
| 70 | + |
| 71 | +For a more intuitive feeling about performance, I used the old command `time` to measure the wall time to run the command. The result is not good. |
| 72 | + |
| 73 | +```bash |
| 74 | +time ast-grep scan -c eslint/sgconfig.yml TypeScript/src |
| 75 | +17.63s user, 0.46s system, 167% cpu, 10.823 total |
| 76 | +``` |
| 77 | + |
| 78 | +The time before `user` is the actual CPU time spent on my program. The time before `total` represents the wall time. The ratio between them is the CPU utilization. In this case, it is 167%. It means my program is not fully utilizing the CPU. |
| 79 | + |
| 80 | +It only runs six rules against the codebase and it costs about 10 whole seconds! |
| 81 | + |
| 82 | + |
| 83 | +In contrast, running one ast-grep pattern agasint the TypeScript source only costs 0.5 second and the CPU utilization is decent. |
| 84 | + |
| 85 | +```bash |
| 86 | +time ast-grep run -p '$A && $A()' TypeScript/src --json > /dev/null |
| 87 | + |
| 88 | +1.96s user, 0.11s system, 329% cpu, 0.628 total |
| 89 | +``` |
| 90 | + |
| 91 | +# Expensive Regex Cloning |
| 92 | + |
| 93 | +The first thing I noticed is that the `regex::Regex` type is cloned a lot. I do know it is expensive to compile a regex, but I did not expect cloning one will be the bottleneck. |
| 94 | +Much to my limited understanding, `drop`ping Regex is also expensive! |
| 95 | + |
| 96 | +Fortunately the fix is simple: I can use a reference to the regex instead of cloning it. |
| 97 | + |
| 98 | +This optimzation alone shaves about 50% of execution time. |
| 99 | + |
| 100 | +```bash |
| 101 | +time ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null |
| 102 | +13.89s user, 0.74s system, 274% cpu 5.320 total |
| 103 | +``` |
| 104 | + |
| 105 | +The new flamegraph looks like this. |
| 106 | + |
| 107 | +<img width="1509" alt="Avoid Regex Cloning" src="https://user-images.githubusercontent.com/2883231/215318711-634a8b99-3e02-4187-9073-ea5be25d098f.png"> |
| 108 | + |
| 109 | + |
| 110 | +# Matching Rule can be Avoided |
| 111 | + |
| 112 | +The second thing I noticed is that the `match_node` function is called a lot. It is the function that matches a pattern against an AST node. |
| 113 | +ast-grep can match an AST node by rules, and those rules can be composed together into more complex rules. |
| 114 | +For example, the rule `any: [rule1, rule2]` is a composite rule that consists of two sub-rules and the composite rule matches a node when either one of the sub-rules matches the node. |
| 115 | +This can be expensive since multiple rules must be tried for every node to see if they actually make a match. |
| 116 | + |
| 117 | +I have already forsee it so every rule in ast-grep has an optimzation called `potential_kinds`. AST node in tree-sitter has its own type encoded in a unsigned number called `kind`. |
| 118 | +If a rule can only match nodes with specific kinds, then we can avoid calling `match_node` for nodes if its kind is not in the `potential_kinds` set. |
| 119 | +I used a BitSet to encode the set of potential kinds. Naturally the `potential_kinds` of composite rules can be constructed by merging the `potential_kinds` of its sub-rules, according to their logic nature. |
| 120 | +For example, `any`'s potential_kinds is the union of its sub-rules' potential_kinds, and `all`'s potential_kinds is the intersection of its sub-rules' potential_kinds. |
| 121 | + |
| 122 | +Using this optimization, I can avoid calling `match_node` for nodes that can never match a rule. This optimization shaves another 40% of execution time! |
| 123 | + |
| 124 | +```bash |
| 125 | +ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null |
| 126 | +11.57s user, 0.48s system, 330% cpu, 3.644 total |
| 127 | +``` |
| 128 | + |
| 129 | +The new flamegraph. |
| 130 | + |
| 131 | +<img width="1503" alt="potential_kinds trick" src="https://user-images.githubusercontent.com/2883231/215318794-7e3cf452-5016-4541-9c9d-1e266c1ee324.png"> |
| 132 | + |
| 133 | +# Duplicate Tree Traversal |
| 134 | + |
| 135 | +Finally, the function call `ts_tree_cursor_child_iterator_next` caught my eyes. It meant that a lot of time was spent on traversing the AST tree. |
| 136 | + |
| 137 | +Well, I dumbly iterating through all the six rules and matching the whole AST tree for each rule. This is a lot of duplicated work! |
| 138 | + |
| 139 | +So I used a data structure to combine these rules, according to their `potential_kinds`. When I'm traversing the AST tree, I will first retrieve the rules with potential_kinds containing the kind of the current node. Then I will only run these rules against the node. And nodes without any `potential_kinds` hit will be naturally skipped during the traversal. |
| 140 | + |
| 141 | +This is a huge optimization! The ending result is less than 1 second! And the CPU utilization is pretty good. |
| 142 | + |
| 143 | +```bash |
| 144 | +ast-grep scan -c eslint/sgconfig.yml TypeScript/src --json > /dev/null |
| 145 | +2.82s user, 0.12s system, 301% cpu, 0.975 total |
| 146 | +``` |
| 147 | + |
| 148 | +# Conclusion |
| 149 | + |
| 150 | +The final flamegraph looks like this. I'm too lazy to optimize more. I'm happy with the sub-second result for now. |
| 151 | + |
| 152 | +<img width="1513" alt="Merging rules" src="https://user-images.githubusercontent.com/2883231/215318867-b670b5fe-c678-4c31-985f-36e4f620baeb.png"> |
| 153 | + |
| 154 | +Optimizing ast-grep is a fun journey. I learned a lot about Rust and performance tuning. I hope you enjoyed this post as well. |
0 commit comments