Skip to content

poc: use oxc as alternative parser #1970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

HerringtonDarkholme
Copy link
Member

@HerringtonDarkholme HerringtonDarkholme commented May 4, 2025

Summary by CodeRabbit

  • New Features
    • Introduced a new command-line tool for searching AST pattern matches in JavaScript/TypeScript source files using Oxc integration.
    • Added support for parsing, traversing, and pattern matching on JavaScript/TypeScript code via Oxc bindings.
  • Chores
    • Added a new crate manifest for Oxc integration with ast-grep.
  • Refactor
    • Updated trait methods to provide default implementations, reducing the need for manual overrides.

Copy link
Contributor

coderabbitai bot commented May 4, 2025

Walkthrough

This update introduces a new Rust crate for integrating Oxc (a JavaScript/TypeScript parser) with the ast-grep framework. It adds a Cargo manifest, implements Oxc language bindings, provides AST node and document abstractions, and supplies a command-line tool for searching AST patterns in source files. Additionally, default trait method implementations are added to the core SgNode trait.

Changes

File(s) Change Summary
crates/core/src/source.rs Added default (no-op) implementations for field, field_children, and child_by_field_id in the SgNode trait.
crates/oxc/Cargo.toml Introduced a new Cargo manifest for the ast-grep-oxc crate, specifying dependencies and metadata.
crates/oxc/src/main.rs Added a CLI entry point to parse a file, compile an AST pattern, and print matches using Oxc and ast-grep integration.
crates/oxc/src/binding.rs Added Oxc language bindings: defines OxcLang, OxcNode, OxcDoc; implements AST traversal, parsing, and error handling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (main.rs)
    participant OxcDoc
    participant OxcLang
    participant AstGrep
    participant Pattern

    User->>CLI (main.rs): Provide file path & pattern
    CLI (main.rs)->>OxcLang: Create language from file extension
    CLI (main.rs)->>OxcDoc: Parse source text to AST document
    OxcDoc->>OxcLang: Use parser & semantic analysis
    CLI (main.rs)->>AstGrep: Create from OxcDoc
    CLI (main.rs)->>Pattern: Compile pattern string
    CLI (main.rs)->>AstGrep: Search root node with pattern
    AstGrep->>CLI (main.rs): Return matches
    CLI (main.rs)->>User: Print matched texts
Loading

Poem

In the warren, code takes flight,
Oxc and grep now join their might.
Patterns searched through JavaScript trees,
With Rusty paws, we parse with ease.
Default traits, new crates appear—
Hopping forward, spring is here!
🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented May 4, 2025

Codecov Report

Attention: Patch coverage is 0% with 337 lines in your changes missing coverage. Please review.

Project coverage is 84.86%. Comparing base (646c5a8) to head (6a92e81).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
crates/oxc/src/binding.rs 0.00% 313 Missing ⚠️
crates/oxc/src/main.rs 0.00% 15 Missing ⚠️
crates/core/src/source.rs 0.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1970      +/-   ##
==========================================
- Coverage   86.64%   84.86%   -1.79%     
==========================================
  Files         100      102       +2     
  Lines       16046    16383     +337     
==========================================
  Hits        13903    13903              
- Misses       2143     2480     +337     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
crates/oxc/src/main.rs (3)

380-383: field_to_id returns None – limits query expressiveness

Language::field_to_id is currently unimplemented, which means $foo.bar style field queries in patterns cannot work. Even a partial mapping for the common fields ("body", "argument", …) would greatly improve usability.

Consider wiring oxc_ast::ast::FIELD_MAP (or similar) here, or transparently fall back to a slow string comparison.


205-376: Gigantic match table can be generated – keep the source maintainable

Hard-coding 170-plus match arms is error-prone and hard to maintain.
Instead, generate the mapping once:

lazy_static::lazy_static! {
  static ref KIND_TO_ID: HashMap<&'static str, u16> =
    IDS.iter().enumerate().map(|(i, k)| (*k, i as u16)).collect();
}

fn kind_to_id(&self, kind: &str) -> u16 {
  *KIND_TO_ID.get(kind).unwrap_or_else(|| panic!("Unknown kind: {kind}"))
}

This avoids duplicating the data, shrinks the binary a bit, and removes the need to update two places when Oxc adds a new node kind.


548-550: do_edit not implemented – AstGrep::replace & incremental features will break

The trait requires an implementation; leaving it as todo!() prevents consumers from performing edits and makes the doc non-compliant with Doc expectations. Even a pessimistic “full re-parse” implementation is preferable.

Would you like assistance in adding a minimal, safe implementation?

crates/oxc/test.ts (1)

1-3: Remove leftover commented-out console log

The commented line adds noise to the sample; if it’s not needed, delete it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 646c5a8 and 6687ae0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/core/src/source.rs (1 hunks)
  • crates/oxc/Cargo.toml (1 hunks)
  • crates/oxc/src/main.rs (1 hunks)
  • crates/oxc/test.ts (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
crates/core/src/source.rs (2)
crates/core/src/node.rs (3)
  • field (241-247)
  • field_children (257-263)
  • child_by_field_id (249-255)
crates/napi/src/sg_node.rs (2)
  • field (309-317)
  • field_children (321-329)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: coverage
🔇 Additional comments (1)
crates/oxc/Cargo.toml (1)

13-20: Pinning Oxc crates to an exact patch version may hinder workspace upgrades

All Oxc dependencies are fixed to 0.68.1. When Oxc releases patch fixes, cargo update will not pick them up. Consider using a caret range ("0.68"), or add a comment explaining the rationale for strict pinning.

Also, double-check that oxc_index = "3" aligns with 0.68.x – mismatched versions occasionally break Idx casts.

Comment on lines +122 to +133
#[allow(unused_variables)]
fn field(&self, name: &str) -> Option<Self> {
None
}
#[allow(unused_variables)]
fn field_children(&self, field_id: Option<u16>) -> impl Iterator<Item = Self> {
std::iter::empty()
}
#[allow(unused_variables)]
fn child_by_field_id(&self, field_id: u16) -> Option<Self> {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Breaking API change: field_children signature no longer aligns with callers

field_children now accepts Option<u16> instead of &str, but all existing call-sites (e.g. crates/napi/src/sg_node.rs and crates/core/src/node.rs shown in the snippets) still pass a field name (&str).
Unless every downstream crate is updated simultaneously this will be a hard compile error.

-  fn field_children(&self, field_id: Option<u16>) -> impl Iterator<Item = Self>;
+  fn field_children(&self, name: &str) -> impl Iterator<Item = Self>;

Please either:

  1. Revert the signature to the previous, name-based API, or
  2. Update every caller and provide a convenience helper that converts names to ids using lang.field_to_id, to avoid scattering Option<u16> everywhere.

Leaving it as-is will break the public trait and downstream crates.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 458 to 465
self.inner.span.start as usize..self.inner.span.end as usize
}
fn start_pos(&self) -> ast_grep_core::Position {
todo!("not implemented")
}
fn end_pos(&self) -> ast_grep_core::Position {
todo!("not implemented")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

start_pos / end_pos are todo!() – any call will panic at runtime

These trait methods are part of the public SgNode contract. Leaving them unimplemented will cause a panic as soon as a consumer calls node.start_pos() (e.g. error reporting, code actions).

Implement them using Span::start_utf16 / end_utf16 (or UTF-8) depending on the expected encoding, or return a sensible default and document limitations.

Comment on lines 394 to 417
fn convert(nodes: &AstNodes) -> HashMap<NodeId, InnerOxc> {
let mut pool = HashMap::default();
for node in nodes.iter() {
let id = node.id();
let span = node.span();
let parent_id = nodes.parent_id(node.id());
let kind_id = node.kind().ty() as u16;
pool.insert(
id,
InnerOxc {
id,
kind_id,
parent_id,
span,
children: vec![],
},
);
if let Some(parent_id) = parent_id {
if let Some(parent) = pool.get_mut(&parent_id) {
parent.children.push(id);
}
}
}
pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Children list may be incomplete when parent is visited after child

convert pushes a child id into its parent’s children vector only if the parent has already been inserted into pool.
AST traversals do not guarantee this ordering, so parents that appear after their children will silently miss the link.

A safe two-pass approach (first insert, then populate children) avoids the race:

fn convert(nodes: &AstNodes) -> HashMap<NodeId, InnerOxc> {
-  let mut pool = HashMap::default();
-  for node in nodes.iter() {
-    // insert child, maybe update parent (if present)
-    ...
-    if let Some(parent_id) = parent_id {
-      if let Some(parent) = pool.get_mut(&parent_id) {
-        parent.children.push(id);
-      }
-    }
-  }
+  let mut pool: HashMap<NodeId, InnerOxc> = nodes
+    .iter()
+    .map(|n| {
+      let id = n.id();
+      (
+        id,
+        InnerOxc {
+          id,
+          kind_id: n.kind().ty() as u16,
+          parent_id: nodes.parent_id(id),
+          span: n.span(),
+          children: Vec::new(),
+        },
+      )
+    })
+    .collect();
+
+  // second pass – populate children
+  for inner in pool.values() {
+    if let Some(pid) = inner.parent_id {
+      if let Some(parent) = pool.get_mut(&pid) {
+        parent.children.push(inner.id);
+      }
+    }
+  }
   pool
 }

Without this fix, pattern matching that navigates downward may return incomplete results.

Committable suggestion skipped: line range outside the PR's diff.

@michaelangeloio
Copy link

Question: Would we expose the underlying rust types, similar to how we do with the tree-sitter nodes?

It's quite handy to be able to access the underlying nodes (if needed) in the Rust crates 😄

@HerringtonDarkholme
Copy link
Member Author

Would we expose the underlying rust types

Yes

/// the underlying tree-sitter Node
pub fn get_inner_node(&self) -> D::Node<'r> {
self.inner.clone()
}

@HerringtonDarkholme HerringtonDarkholme marked this pull request as ready for review May 24, 2025 19:11
@HerringtonDarkholme HerringtonDarkholme marked this pull request as draft May 24, 2025 19:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a proof-of-concept Oxc-based parser integration for ast-grep by introducing a new ast-grep-oxc crate and updating the core SgNode trait to include default stubs for field navigation.

  • Adds an oxc crate with:
    • CLI entrypoint (main.rs) for running AST pattern searches via Oxc
    • binding.rs implementing Language, Doc, and node mapping from Oxc
    • Cargo manifest (Cargo.toml) for the new crate
  • Updates the core AST interface (SgNode in source.rs) with default implementations of field navigation methods

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

File Description
crates/oxc/src/main.rs New CLI that drives parsing and pattern matching via Oxc
crates/oxc/src/binding.rs Core binding logic: builds AST, converts Oxc nodes into SgNode for ast-grep
crates/oxc/Cargo.toml Manifest for the ast-grep-oxc crate
crates/core/src/source.rs Added default stub implementations for field, field_children, child_by_field_id in SgNode
Comments suppressed due to low confidence (3)

crates/oxc/src/main.rs:9

  • [nitpick] The variable name pat is ambiguous; consider renaming to pattern_str or pattern_arg for clarity.
let pat = args.get(2).expect("Must provide a pattern");

crates/oxc/src/binding.rs:1

  • Add unit tests for OxcDoc::try_new, parse error handling, and pattern-building to ensure the Oxc integration is covered.
use ast_grep_core::matcher::{Pattern, PatternBuilder, PatternError};

crates/oxc/src/binding.rs:183

  • The panic formatting is incorrect and will fail to compile; use named or positional parameters, e.g., panic!("Unknown kind: {}", kind).
_ => panic!("Unknown kind: {kind}"),

self.inner.span.start as usize..self.inner.span.end as usize
}
fn start_pos(&self) -> ast_grep_core::Position {
todo!("not implemented")
Copy link
Preview

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both start_pos and end_pos are unimplemented and will panic at runtime; implement these to return correct node positions or provide default stubs.

Copilot uses AI. Check for mistakes.

Comment on lines +122 to +130
#[allow(unused_variables)]
fn field(&self, name: &str) -> Option<Self> {
None
}
#[allow(unused_variables)]
fn field_children(&self, field_id: Option<u16>) -> impl Iterator<Item = Self> {
std::iter::empty()
}
#[allow(unused_variables)]
Copy link
Preview

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These default stub methods (field, field_children, child_by_field_id) always return empty results, which may break field navigation in Oxc bindings—consider implementing or documenting their intended behaviors.

Suggested change
#[allow(unused_variables)]
fn field(&self, name: &str) -> Option<Self> {
None
}
#[allow(unused_variables)]
fn field_children(&self, field_id: Option<u16>) -> impl Iterator<Item = Self> {
std::iter::empty()
}
#[allow(unused_variables)]
#[allow(unused_variables)]
/// Returns the child node associated with the given field name, if it exists.
///
/// # Parameters
/// - `name`: The name of the field to look up.
///
/// # Returns
/// An `Option` containing the child node if the field exists, or `None` otherwise.
///
/// # Note
/// This method is expected to be overridden by implementors of the `SgNode` trait.
fn field(&self, name: &str) -> Option<Self> {
None
}
#[allow(unused_variables)]
/// Returns an iterator over the children of the node that belong to the specified field.
///
/// # Parameters
/// - `field_id`: An optional field ID to filter the children.
///
/// # Returns
/// An iterator over the child nodes associated with the field. By default, this returns an empty iterator.
///
/// # Note
/// This method is expected to be overridden by implementors of the `SgNode` trait.
fn field_children(&self, field_id: Option<u16>) -> impl Iterator<Item = Self> {
std::iter::empty()
}
#[allow(unused_variables)]
/// Returns the child node associated with the given field ID, if it exists.
///
/// # Parameters
/// - `field_id`: The ID of the field to look up.
///
/// # Returns
/// An `Option` containing the child node if the field exists, or `None` otherwise.
///
/// # Note
/// This method is expected to be overridden by implementors of the `SgNode` trait.

Copilot uses AI. Check for mistakes.

let pat = args.get(2).expect("Must provide a pattern");
let path = std::path::Path::new(&name);
let source_text = std::fs::read_to_string(path)?;
let lang = OxcLang(SourceType::from_path(path).unwrap());
Copy link
Preview

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using unwrap() on SourceType::from_path; handle the error case with a clear message to prevent unexpected panics on unsupported file extensions.

Suggested change
let lang = OxcLang(SourceType::from_path(path).unwrap());
let lang = match SourceType::from_path(path) {
Ok(source_type) => OxcLang(source_type),
Err(_) => {
eprintln!("Error: Unsupported file extension for path {:?}", path);
std::process::exit(1);
}
};

Copilot uses AI. Check for mistakes.

pub struct OxcLang(pub SourceType);

impl Language for OxcLang {
fn kind_to_id(&self, kind: &str) -> u16 {
Copy link
Preview

Copilot AI May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The large match block for mapping kinds to IDs is brittle; consider generating this mapping or using a lookup table/macro to reduce maintenance overhead.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
crates/oxc/src/binding.rs (2)

269-274: ⚠️ Potential issue

Critical: Implement start_pos and end_pos methods to prevent runtime panics.

These methods are part of the public SgNode trait contract. Leaving them unimplemented will cause a panic when any code tries to get position information (e.g., for error reporting or code actions).

   fn start_pos(&self) -> ast_grep_core::Position {
-    todo!("not implemented")
+    // Calculate position from span start
+    let pos = self.inner.span.start as usize;
+    // TODO: Implement proper line/column calculation from byte offset
+    ast_grep_core::Position {
+      line: 0,  // Placeholder - needs proper implementation
+      column: pos,
+    }
   }
   fn end_pos(&self) -> ast_grep_core::Position {
-    todo!("not implemented")
+    // Calculate position from span end
+    let pos = self.inner.span.end as usize;
+    // TODO: Implement proper line/column calculation from byte offset
+    ast_grep_core::Position {
+      line: 0,  // Placeholder - needs proper implementation
+      column: pos,
+    }
   }

Alternatively, you could store line/column information during parsing or use a source map to convert byte offsets to positions.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 272-273: crates/oxc/src/binding.rs#L272-L273
Added lines #L272 - L273 were not covered by tests


203-227: ⚠️ Potential issue

Fix parent-child relationship ordering issue in convert function.

The current implementation only updates parent's children list if the parent has already been inserted into the pool. This creates incomplete parent-child relationships when nodes are processed in certain orders.

Use a two-pass approach to ensure all parent-child relationships are properly established:

 fn convert(nodes: &AstNodes) -> HashMap<NodeId, InnerOxc> {
-  let mut pool = HashMap::default();
-  for node in nodes.iter() {
-    let id = node.id();
-    let span = node.span();
-    let parent_id = nodes.parent_id(node.id());
-    let kind_id = node.kind().ty() as u16;
-    pool.insert(
-      id,
-      InnerOxc {
-        id,
-        kind_id,
-        parent_id,
-        span,
-        children: vec![],
-      },
-    );
-    if let Some(parent_id) = parent_id {
-      if let Some(parent) = pool.get_mut(&parent_id) {
-        parent.children.push(id);
-      }
-    }
-  }
+  // First pass: Create all nodes
+  let mut pool: HashMap<NodeId, InnerOxc> = nodes
+    .iter()
+    .map(|node| {
+      let id = node.id();
+      (
+        id,
+        InnerOxc {
+          id,
+          kind_id: node.kind().ty() as u16,
+          parent_id: nodes.parent_id(id),
+          span: node.span(),
+          children: Vec::new(),
+        },
+      )
+    })
+    .collect();
+
+  // Second pass: Populate children
+  let node_ids: Vec<NodeId> = pool.keys().copied().collect();
+  for node_id in node_ids {
+    if let Some(parent_id) = pool[&node_id].parent_id {
+      if let Some(parent) = pool.get_mut(&parent_id) {
+        parent.children.push(node_id);
+      }
+    }
+  }
   pool
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 203-224: crates/oxc/src/binding.rs#L203-L224
Added lines #L203 - L224 were not covered by tests


[warning] 226-227: crates/oxc/src/binding.rs#L226-L227
Added lines #L226 - L227 were not covered by tests

🧹 Nitpick comments (2)
crates/oxc/src/binding.rs (2)

186-188: Implement field_to_id or document why it returns None.

The method always returns None, which might limit field-based pattern matching capabilities. If this is intentional, please add a comment explaining why fields are not supported.

Would you like me to help implement field mapping for Oxc AST nodes if field-based matching is needed?


357-359: Implement the do_edit method or document its limitations.

The method is marked as todo!() which will panic if called. This limits the usefulness of the Oxc integration for code modification scenarios.

Would you like me to help implement the editing functionality? This would involve:

  1. Re-parsing the modified source
  2. Updating the AST node pool
  3. Maintaining node ID stability where possible
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6687ae0 and 6a92e81.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • crates/oxc/Cargo.toml (1 hunks)
  • crates/oxc/src/binding.rs (1 hunks)
  • crates/oxc/src/main.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/oxc/Cargo.toml
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: HerringtonDarkholme
PR: ast-grep/ast-grep#1497
File: crates/cli/src/utils/mod.rs:174-175
Timestamp: 2024-09-29T21:59:50.675Z
Learning: ast-grep requires Rust 1.65 or newer. It's acceptable to use syntax and features introduced in Rust 1.65 or newer in ast-grep code.
🧬 Code Graph Analysis (1)
crates/oxc/src/main.rs (2)
crates/core/src/node.rs (1)
  • doc (54-56)
crates/oxc/src/binding.rs (1)
  • try_new (333-343)
🪛 GitHub Check: codecov/patch
crates/oxc/src/binding.rs

[warning] 14-183: crates/oxc/src/binding.rs#L14-L183
Added lines #L14 - L183 were not covered by tests


[warning] 185-191: crates/oxc/src/binding.rs#L185-L191
Added lines #L185 - L191 were not covered by tests


[warning] 203-224: crates/oxc/src/binding.rs#L203-L224
Added lines #L203 - L224 were not covered by tests


[warning] 226-227: crates/oxc/src/binding.rs#L226-L227
Added lines #L226 - L227 were not covered by tests


[warning] 236-270: crates/oxc/src/binding.rs#L236-L270
Added lines #L236 - L270 were not covered by tests


[warning] 272-273: crates/oxc/src/binding.rs#L272-L273
Added lines #L272 - L273 were not covered by tests


[warning] 286-330: crates/oxc/src/binding.rs#L286-L330
Added lines #L286 - L330 were not covered by tests


[warning] 333-343: crates/oxc/src/binding.rs#L333-L343
Added lines #L333 - L343 were not covered by tests


[warning] 351-358: crates/oxc/src/binding.rs#L351-L358
Added lines #L351 - L358 were not covered by tests


[warning] 360-370: crates/oxc/src/binding.rs#L360-L370
Added lines #L360 - L370 were not covered by tests

crates/oxc/src/main.rs

[warning] 6-20: crates/oxc/src/main.rs#L6-L20
Added lines #L6 - L20 were not covered by tests

Comment on lines +6 to +20
fn main() -> std::io::Result<()> {
let args = std::env::args().collect::<Vec<_>>();
let name = args.get(1).expect("Must provide a file");
let pat = args.get(2).expect("Must provide a pattern");
let path = std::path::Path::new(&name);
let source_text = std::fs::read_to_string(path)?;
let lang = OxcLang(SourceType::from_path(path).unwrap());
let doc = OxcDoc::try_new(source_text, lang).expect("Failed to parse");
let sg = AstGrep::doc(doc);
let pattern = Pattern::new(pat, lang);
for m in sg.root().find_all(&pattern) {
println!("Oxc ast-grep Found: {}", m.text());
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and user experience in the CLI tool.

The current implementation uses expect() which provides poor user experience when errors occur. Consider implementing proper error handling with helpful messages.

 fn main() -> std::io::Result<()> {
   let args = std::env::args().collect::<Vec<_>>();
-  let name = args.get(1).expect("Must provide a file");
-  let pat = args.get(2).expect("Must provide a pattern");
+  if args.len() < 3 {
+    eprintln!("Usage: {} <file> <pattern>", args.get(0).unwrap_or(&"oxc".to_string()));
+    std::process::exit(1);
+  }
+  let name = &args[1];
+  let pat = &args[2];
+  
   let path = std::path::Path::new(&name);
+  if !path.exists() {
+    eprintln!("Error: File '{}' not found", name);
+    std::process::exit(1);
+  }
+  
   let source_text = std::fs::read_to_string(path)?;
-  let lang = OxcLang(SourceType::from_path(path).unwrap());
-  let doc = OxcDoc::try_new(source_text, lang).expect("Failed to parse");
+  let source_type = SourceType::from_path(path).unwrap_or_else(|_| {
+    eprintln!("Warning: Unknown file extension, defaulting to JavaScript");
+    SourceType::default()
+  });
+  let lang = OxcLang(source_type);
+  
+  let doc = match OxcDoc::try_new(source_text, lang) {
+    Ok(doc) => doc,
+    Err(e) => {
+      eprintln!("Failed to parse file: {}", e);
+      std::process::exit(1);
+    }
+  };
+  
   let sg = AstGrep::doc(doc);
   let pattern = Pattern::new(pat, lang);
   for m in sg.root().find_all(&pattern) {
     println!("Oxc ast-grep Found: {}", m.text());
   }
   Ok(())
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn main() -> std::io::Result<()> {
let args = std::env::args().collect::<Vec<_>>();
let name = args.get(1).expect("Must provide a file");
let pat = args.get(2).expect("Must provide a pattern");
let path = std::path::Path::new(&name);
let source_text = std::fs::read_to_string(path)?;
let lang = OxcLang(SourceType::from_path(path).unwrap());
let doc = OxcDoc::try_new(source_text, lang).expect("Failed to parse");
let sg = AstGrep::doc(doc);
let pattern = Pattern::new(pat, lang);
for m in sg.root().find_all(&pattern) {
println!("Oxc ast-grep Found: {}", m.text());
}
Ok(())
}
fn main() -> std::io::Result<()> {
let args = std::env::args().collect::<Vec<_>>();
if args.len() < 3 {
eprintln!(
"Usage: {} <file> <pattern>",
args.get(0).unwrap_or(&"oxc".to_string())
);
std::process::exit(1);
}
let name = &args[1];
let pat = &args[2];
let path = std::path::Path::new(&name);
if !path.exists() {
eprintln!("Error: File '{}' not found", name);
std::process::exit(1);
}
let source_text = std::fs::read_to_string(path)?;
let source_type = SourceType::from_path(path).unwrap_or_else(|_| {
eprintln!("Warning: Unknown file extension, defaulting to JavaScript");
SourceType::default()
});
let lang = OxcLang(source_type);
let doc = match OxcDoc::try_new(source_text, lang) {
Ok(doc) => doc,
Err(e) => {
eprintln!("Failed to parse file: {}", e);
std::process::exit(1);
}
};
let sg = AstGrep::doc(doc);
let pattern = Pattern::new(pat, lang);
for m in sg.root().find_all(&pattern) {
println!("Oxc ast-grep Found: {}", m.text());
}
Ok(())
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 6-20: crates/oxc/src/main.rs#L6-L20
Added lines #L6 - L20 were not covered by tests

🤖 Prompt for AI Agents
In crates/oxc/src/main.rs from lines 6 to 20, replace the use of expect() calls
with proper error handling to improve user experience. Instead of panicking,
check if the command line arguments are present and return a user-friendly error
message if not. Similarly, handle errors from parsing and file reading
gracefully by returning or printing descriptive error messages rather than using
expect(). This will make the CLI tool more robust and informative for users.

Comment on lines +14 to +371
fn end_pos(&self) -> ast_grep_core::Position {
todo!("not implemented")
}
}

#[derive(Clone)]
pub struct OxcDoc {
_allocator: Arc<Allocator>,
source: String,
lang: OxcLang,
pool: HashMap<NodeId, InnerOxc>,
root_id: Option<NodeId>,
}

fn parse(
allocator: &Allocator,
source_text: &String,
lang: OxcLang,
) -> Result<(HashMap<NodeId, InnerOxc>, Option<NodeId>), String> {
// Parse the source text into an AST
let parser_ret = Parser::new(allocator, source_text, lang.0).parse();
if !parser_ret.errors.is_empty() {
let error_message: String = parser_ret
.errors
.into_iter()
.map(|error| {
format!(
"{:?}\n",
error.with_source_code(Arc::new(source_text.to_string()))
)
})
.collect();
return Err(error_message);
}
let program = parser_ret.program;

let semantic = SemanticBuilder::new()
// Enable additional syntax checks not performed by the parser
.with_check_syntax_error(true)
.build(&program);

if !semantic.errors.is_empty() {
let error_message: String = semantic
.errors
.into_iter()
.map(|error| {
format!(
"{:?}\n",
error.with_source_code(Arc::new(source_text.to_string()))
)
})
.collect();
return Err(error_message);
}
let nodes = semantic.semantic.nodes();
let pool = convert(nodes);
let root_id = nodes.root();
Ok((pool, root_id))
}

impl OxcDoc {
pub fn try_new(source: String, lang: OxcLang) -> Result<Self, String> {
let allocator = Allocator::default();
let (pool, root_id) = parse(&allocator, &source, lang)?;
Ok(Self {
_allocator: Arc::new(allocator),
source,
lang,
pool,
root_id,
})
}
}

impl Doc for OxcDoc {
type Source = String;
type Lang = OxcLang;
type Node<'r> = OxcNode<'r>;

fn get_lang(&self) -> &Self::Lang {
&self.lang
}
fn get_source(&self) -> &Self::Source {
&self.source
}
fn do_edit(&mut self, _edit: &ast_grep_core::source::Edit<Self::Source>) -> Result<(), String> {
todo!("not implemented")
}
fn root_node(&self) -> Self::Node<'_> {
let root_id = self.root_id.expect("Root node not found");
let inner = self.pool.get(&root_id).expect("Root node not found");
OxcNode {
inner: inner.clone(),
pool: &self.pool,
}
}
fn get_node_text<'a>(&'a self, node: &Self::Node<'a>) -> Cow<'a, str> {
Cow::Borrowed(&self.source[node.range()])
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive tests for the Oxc integration.

The static analysis shows that this entire module lacks test coverage. Given that this is a proof-of-concept integration, it's crucial to have tests that verify:

  • AST node kind mapping
  • Parent-child relationships
  • Pattern matching functionality
  • Error handling in parsing
  • Edge cases like empty files or syntax errors

Would you like me to help create a comprehensive test suite for this module? This would include:

  1. Unit tests for individual components
  2. Integration tests for pattern matching
  3. Property-based tests for AST traversal correctness
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-183: crates/oxc/src/binding.rs#L14-L183
Added lines #L14 - L183 were not covered by tests


[warning] 185-191: crates/oxc/src/binding.rs#L185-L191
Added lines #L185 - L191 were not covered by tests


[warning] 203-224: crates/oxc/src/binding.rs#L203-L224
Added lines #L203 - L224 were not covered by tests


[warning] 226-227: crates/oxc/src/binding.rs#L226-L227
Added lines #L226 - L227 were not covered by tests


[warning] 236-270: crates/oxc/src/binding.rs#L236-L270
Added lines #L236 - L270 were not covered by tests


[warning] 272-273: crates/oxc/src/binding.rs#L272-L273
Added lines #L272 - L273 were not covered by tests


[warning] 286-330: crates/oxc/src/binding.rs#L286-L330
Added lines #L286 - L330 were not covered by tests


[warning] 333-343: crates/oxc/src/binding.rs#L333-L343
Added lines #L333 - L343 were not covered by tests


[warning] 351-358: crates/oxc/src/binding.rs#L351-L358
Added lines #L351 - L358 were not covered by tests


[warning] 360-370: crates/oxc/src/binding.rs#L360-L370
Added lines #L360 - L370 were not covered by tests

🤖 Prompt for AI Agents
In crates/oxc/src/binding.rs covering the entire module from lines 14 to 371,
there is no test coverage for the Oxc integration. To fix this, add a
comprehensive test suite that includes unit tests for the kind_to_id mapping and
other individual functions, integration tests verifying parent-child
relationships and pattern matching, tests for error handling during parsing
(including syntax errors), and edge case tests such as empty source input. This
will ensure the module's correctness and robustness.

Comment on lines +360 to +367
fn root_node(&self) -> Self::Node<'_> {
let root_id = self.root_id.expect("Root node not found");
let inner = self.pool.get(&root_id).expect("Root node not found");
OxcNode {
inner: inner.clone(),
pool: &self.pool,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle missing root node gracefully instead of panicking.

Using expect() will cause a panic if the root node is not found. Consider returning an error or a dummy node.

   fn root_node(&self) -> Self::Node<'_> {
-    let root_id = self.root_id.expect("Root node not found");
-    let inner = self.pool.get(&root_id).expect("Root node not found");
+    let root_id = self.root_id.unwrap_or_else(|| {
+      // This should not happen with valid parsing, but handle gracefully
+      panic!("Internal error: Root node not found in parsed document");
+    });
+    let inner = self.pool.get(&root_id).unwrap_or_else(|| {
+      panic!("Internal error: Root node {} not found in pool", root_id.index());
+    });
     OxcNode {
       inner: inner.clone(),
       pool: &self.pool,
     }
   }

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 360-370: crates/oxc/src/binding.rs#L360-L370
Added lines #L360 - L370 were not covered by tests

🤖 Prompt for AI Agents
In crates/oxc/src/binding.rs around lines 360 to 367, the root_node function
uses expect() which causes a panic if the root node is missing. Refactor this
function to handle the missing root node gracefully by returning a Result type
or an Option type instead of panicking. This involves changing the return type
to Result<Self::Node<'_>, ErrorType> or Option<Self::Node<'_>> and returning an
appropriate error or None when the root node is not found, avoiding the use of
expect().

"Decorator" => 164,
"TSExportAssignment" => 165,
"TSInstantiationExpression" => 166,
_ => panic!("Unknown kind: {kind}"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace panic with proper error handling for unknown AST node kinds.

Panicking on unknown node kinds will crash the entire application. Consider returning a default value or an error to handle new/unknown node types gracefully.

-      _ => panic!("Unknown kind: {kind}"),
+      _ => {
+        // Log warning and return a high ID to avoid conflicts
+        eprintln!("Warning: Unknown AST node kind: {}", kind);
+        u16::MAX
+      }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_ => panic!("Unknown kind: {kind}"),
_ => {
// Log warning and return a high ID to avoid conflicts
eprintln!("Warning: Unknown AST node kind: {}", kind);
u16::MAX
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-183: crates/oxc/src/binding.rs#L14-L183
Added lines #L14 - L183 were not covered by tests

🤖 Prompt for AI Agents
In crates/oxc/src/binding.rs at line 183, replace the panic!("Unknown kind:
{kind}") with proper error handling by returning a Result or an Option that
indicates an error or unknown node kind instead of crashing. Modify the function
signature if needed to support error propagation, and ensure the caller can
handle the error gracefully without panicking.

Comment on lines +257 to +259
fn kind(&self) -> Cow<str> {
Cow::Owned(format!("{:?}", self.inner.kind_id))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Return actual AST node kind names instead of debug format of kind_id.

The kind() method returns the debug representation of the numeric ID rather than the actual node kind name, which will make pattern matching and debugging difficult.

   fn kind(&self) -> Cow<str> {
-    Cow::Owned(format!("{:?}", self.inner.kind_id))
+    // Convert kind_id back to string representation
+    let kind_name = match self.inner.kind_id {
+      0 => "Program",
+      1 => "IdentifierName",
+      2 => "IdentifierReference",
+      // ... (use the same mapping from kind_to_id but in reverse)
+      _ => return Cow::Owned(format!("Unknown_{}", self.inner.kind_id)),
+    };
+    Cow::Borrowed(kind_name)
   }

Consider creating a static array or using a macro to avoid duplicating the mapping logic.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/oxc/src/binding.rs around lines 257 to 259, the kind() method
currently returns a debug string of the numeric kind_id, which is not helpful
for pattern matching or debugging. Replace this by returning the actual AST node
kind name as a string. Implement a static array or use a macro that maps kind_id
values to their corresponding node kind names, and return the appropriate name
from this mapping instead of the debug format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants