Skip to content

Conversation

youknowone
Copy link
Member

@youknowone youknowone commented Sep 15, 2025

Summary by CodeRabbit

  • New Features
    • Added CPython 3.11+ line table support so bytecode can be mapped back to source lines and columns.
    • Code objects now store and persist line and exception tables and expose them as byte data.
    • New APIs to iterate source ranges and per-instruction positions for richer tracing and debugging.
    • Code object replacement and serialization now preserve these tables for accurate reconstruction and improved debugging.

Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

Adds CPython 3.11+ linetable and exception table support across the compiler, core, marshal, and VM: codegen collects per-instruction locations and encodes a linetable, core bytecode types gain linetable/exceptiontable fields and marshal persists them, and the VM decodes and exposes these tables via new PyCode APIs.

Changes

Cohort / File(s) Summary
Core bytecode types & enums
compiler/core/src/bytecode.rs
Adds public enum PyCodeLocationInfoKind with helpers; extends CodeObject<C> with pub linetable: Box<[u8]> and pub exceptiontable: Box<[u8]>; updates map_bag and map_clone_bag to propagate/clone the new fields.
Compiler IR codegen
compiler/codegen/src/ir.rs
Collects per-instruction SourceLocation entries; implements generate_linetable with helpers write_varint/write_signed_varint, supports Short/OneLine/NoColumns/Long encodings and grouping up to 8 instructions; wires linetable: Box<[u8]> into CodeObject and initializes exceptiontable as an empty boxed slice (TODO: generate real table).
Marshal serialization/deserialization
compiler/core/src/marshal.rs
serialize_code now writes linetable and exceptiontable as length-prefixed byte slices; deserialize_code reads these slices and initializes CodeObject with boxed tables.
VM builtins — PyCode exposure & decoding
vm/src/builtins/code.rs
Adds LineTableReader, PyCodeAddressRange, imports PyCodeLocationInfoKind; extends ReplaceArgs and replace() to accept/propagate linetable/exceptiontable and related metadata; adds co_linetable, co_exceptiontable, co_lines, co_positions; implements decoding/iteration that yields merged (start_offset,end_offset,lineno) ranges and per-instruction position tuples.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Frontend as Compiler Frontend
  participant IR as IR Builder
  participant Codegen as Codegen (ir.rs)
  participant Core as CodeObject (core/bytecode.rs)
  participant Marshal as Marshal
  participant VM as VM PyCode (builtins/code.rs)

  Frontend->>IR: Emit instructions with source locations
  IR->>Codegen: finalize_code(locations, instrs)
  Codegen->>Codegen: generate_linetable(locations) -> Box<[u8]>
  Codegen->>Core: Construct CodeObject{..., linetable, exceptiontable}
  Core->>Marshal: serialize_code includes linetable & exceptiontable
  Marshal->>Core: deserialize_code reads linetable & exceptiontable
  Core->>VM: Load CodeObject into PyCode (contains linetable)
  VM->>VM: LineTableReader parses linetable, yields positions/ranges
  VM-->>Caller: co_linetable()/co_exceptiontable() and co_lines()/co_positions()
Loading
sequenceDiagram
  autonumber
  participant PyCode as PyCode.co_lines()
  participant Reader as LineTableReader
  participant Kind as PyCodeLocationInfoKind

  PyCode->>Reader: init with linetable bytes
  loop per encoded entry
    Reader->>Reader: read varint / signed varint
    Reader->>Kind: decode entry type (Short/OneLine/NoColumns/Long/None)
    Kind-->>Reader: provide delta semantics (line/col/no-line)
    Reader-->>PyCode: emit (start_off, end_off, lineno) or position tuple
  end
  PyCode-->>Caller: iterator of merged ranges or positions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Refactor compile_function #6001 — modifies CodeInfo and finalize_code in compiler/codegen/src/ir.rs, likely overlapping changes to code finalization and linetable handling.

Poem

I nibbled bytes and lines with care,
Packed deltas small in tidy lair.
From IR hops to VM’s view,
Short, Long, OneLine—rules so new.
Exception bed waits for its seed— I hop to stitch the feed. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.39% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "code object linetable" succinctly reflects the primary change in the PR—adding linetable support and related fields/APIs to code objects—so it is directly related to the main changeset and clear to a reviewer. It is short, specific, and avoids noisy or vague wording.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
compiler/core/src/bytecode.rs (2)

36-58: Enum naming nit: consider NoLine over None.

None reads like Option::None at use sites; NoLine matches CPython docs and avoids ambiguity.


218-221: New fields: document invariants.

Add a brief doc that linetable/exceptiontable are CPython 3.11+ raw blobs (co_linetable/co_exceptiontable) and may be empty when unavailable.

compiler/codegen/src/ir.rs (1)

225-227: Exception table placeholder.

Leaving exceptiontable empty is fine short‑term; add a TODO pointing to the planned encoder to avoid shipping a silent stub.

vm/src/builtins/code.rs (4)

47-96: Advance logic: robust to malformed entries.

Skips payload bytes until next header (bit7 set). Consider logging at debug level when encountering invalid bytes to aid diagnostics.


160-176: replace(): new args are reasonable; validate inputs.

Nice to add co_nlocals consistency check. Consider validating co_firstlineno=0 maps to None (already handled) and rejecting negative via API.


498-569: co_lines(): eager list is fine; iterator later.

Merges adjacent ranges with same line and maps NO_LINE to None. Consider yielding a lazy iterator in a follow‑up for large functions.


860-921: Duplicated linetable parsing logic; consider sharing.

LineTableReader duplicates varint/signed‑varint logic from codegen. Extract a small shared helper/module to avoid drift.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca95366 and a22899f.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_code.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • compiler/codegen/src/ir.rs (5 hunks)
  • compiler/core/src/bytecode.rs (4 hunks)
  • compiler/core/src/marshal.rs (1 hunks)
  • vm/src/builtins/code.rs (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • compiler/core/src/marshal.rs
  • compiler/core/src/bytecode.rs
  • vm/src/builtins/code.rs
  • compiler/codegen/src/ir.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • vm/src/builtins/code.rs
🧬 Code graph analysis (3)
compiler/core/src/marshal.rs (1)
vm/src/builtins/code.rs (3)
  • new (32-40)
  • new (338-340)
  • new (867-869)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
  • from_code (61-81)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
  • new (32-40)
  • new (338-340)
  • new (867-869)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (9)
compiler/core/src/bytecode.rs (2)

60-103: Mapping looks correct; add unit tests for encodings.

Please add small tests asserting from_code/helpers match CPython’s 0–15 meanings (short groups, one‑line deltas, no‑columns, long, no‑line).


1269-1280: Propagation LGTM.

map_bag/map_clone_bag correctly carry the new tables.

Also applies to: 1301-1312

compiler/codegen/src/ir.rs (3)

203-205: Good: linetable generation wired into CodeObject.

The generator is called after finalizing locations; placement is correct.


398-505: Encoding aligns with CPython 3.11+ format; add edge‑case tests.

Looks correct (short groups, one‑line deltas, long form, varints). Please add tests for:

  • Large columns (≥128) use long form.
  • Negative/large line deltas.
  • Runs spanning ExtendedArg (length in code units).

506-529: Varint/ZigZag helpers LGTM.

Matches 6‑bit continuation and ZigZag sign handling.

vm/src/builtins/code.rs (4)

22-41: Reader/state structure looks sound.

PyCodeAddressRange and initialization mirror CPython’s iteration; sentinel handling for NO_LINE is clear.


97-136: Delta decoding matches encoding.

Covers LONG/NO_COLUMNS/ONE_LINE/SHORT; NO_LINE handled via caller. Good.


486-497: Accessors LGTM.

Expose raw tables as bytes; copies are acceptable given size.


570-689: co_positions(): per‑instruction tuples match CPython semantics.

Decoding for LONG/NO_COLUMNS/ONE_LINE/SHORT looks correct. Add tests covering all kinds, including NO_LINE spans.

Comment on lines 272 to 284
linetable: Box::new([]), // TODO: Read from marshal format
exceptiontable: Box::new([]), // TODO: Read from marshal format
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Marshal drops linetable/exceptiontable (lossy round‑trip).

Deserialization initializes both tables to empty, and serialization doesn’t write them, so any compiled code marshaled/unmarshaled loses debug data.

  • Either: keep format v4 lossy but clearly document it and gate uses that rely on line info.
  • Or: extend the format to include linetable and exceptiontable (likely bump format; ensure backward compatibility when reading older blobs).

I can draft a backward‑compatible read/write plan if you confirm where FORMAT_VERSION is emitted/checked.


🏁 Script executed:

#!/bin/bash
# Find where marshal format version is written/read and who consumes it.
rg -n "FORMAT_VERSION|serialize_code\\(|deserialize_code\\(|FrozenCodeObject|frozen" -C3

Length of output: 61799


Marshal drops linetable/exceptiontable — preserve debug data or document lossiness (critical).

File: compiler/core/src/marshal.rs
Lines: 272-274

        linetable: Box::new([]),      // TODO: Read from marshal format
        exceptiontable: Box::new([]), // TODO: Read from marshal format
    })

Deserialization initializes both tables to empty and serialization doesn’t write them — any serialized/deserialized CodeObject (including frozen modules via compiler/core/src/frozen.rs) loses line/exception debug info.

  • Evidence: FORMAT_VERSION is declared at compiler/core/src/marshal.rs:7; marshal::serialize_code/deserialize_code are used by compiler/core/src/frozen.rs (encode/decode) and frozen generation paths.
  • Action (pick one):
    • Keep v4 lossy but document it and gate code that relies on line/exception tables.
    • Extend the marshal format to include linetable & exceptiontable, bump FORMAT_VERSION, implement a backward‑compatible reader that treats missing tables as empty, and update serialize_code (and frozen encode paths) to write the new fields.
  • Files to change: compiler/core/src/marshal.rs (serialize/deserialize), compiler/core/src/frozen.rs (encode/decode), update FORMAT_VERSION at compiler/core/src/marshal.rs:7 and vm/src/stdlib/marshal.rs if exposing a new version.

I can draft a backward‑compatible read/write plan now that FORMAT_VERSION location is confirmed.

🤖 Prompt for AI Agents
In compiler/core/src/marshal.rs around lines 272-274, deserialization currently
sets linetable and exceptiontable to empty which strips debug info; implement
the non-lossy fix: extend the marshal format to include linetable and
exceptiontable, bump FORMAT_VERSION in compiler/core/src/marshal.rs (and mirror
the new version in vm/src/stdlib/marshal.rs), update serialize_code to write the
new fields, update deserialize_code to read them while remaining
backward-compatible (treat missing fields from older versions as empty), and
update compiler/core/src/frozen.rs encode/decode to emit/consume the new fields
so frozen modules preserve line/exception tables.

Comment on lines +759 to +767
let instructions = match args.co_code {
OptionalArg::Present(_code_bytes) => {
// Convert bytes back to instructions
// For now, keep the original instructions
// TODO: Properly parse bytecode from bytes
self.code.instructions.clone()
}
OptionalArg::Missing => self.code.instructions.clone(),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

co_code parsing TODO leaves API partially inert.

When users pass co_code, it’s ignored. Either parse now or reject with a clear error to avoid surprising behavior.

Apply one of:

-            OptionalArg::Present(_code_bytes) => {
-                // Convert bytes back to instructions
-                // For now, keep the original instructions
-                // TODO: Properly parse bytecode from bytes
-                self.code.instructions.clone()
-            }
+            OptionalArg::Present(_code_bytes) => {
+                return Err(vm.new_not_implemented_error("co_code replacement is not yet supported".to_string()));
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let instructions = match args.co_code {
OptionalArg::Present(_code_bytes) => {
// Convert bytes back to instructions
// For now, keep the original instructions
// TODO: Properly parse bytecode from bytes
self.code.instructions.clone()
}
OptionalArg::Missing => self.code.instructions.clone(),
};
let instructions = match args.co_code {
OptionalArg::Present(_code_bytes) => {
return Err(vm.new_not_implemented_error("co_code replacement is not yet supported".to_string()));
}
OptionalArg::Missing => self.code.instructions.clone(),
};
🤖 Prompt for AI Agents
In vm/src/builtins/code.rs around lines 759-767, the code currently ignores
OptionalArg::Present(_code_bytes) for co_code; either implement parsing or
explicitly reject input — update the match so that OptionalArg::Present(_)
returns an immediate error (with a clear message like "co_code is not supported
yet; parsing bytecode is not implemented") instead of silently using
self.code.instructions.clone(), and ensure the function returns the appropriate
error type per surrounding code paths and add/update a unit test asserting that
passing co_code yields the expected error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
compiler/core/src/marshal.rs (1)

7-7: Marshal format changed without version bump — bump to v5 and make deserialization tolerant.

You added linetable/exceptiontable to CodeObject encoding but left FORMAT_VERSION = 4 and the deserializer reads the new tails unconditionally; old v4 blobs will EOF here and older readers can’t distinguish v5. Bump the format and make tail reads backward‑compatible.

Apply:

- pub const FORMAT_VERSION: u32 = 4;
+ // v5: includes linetable & exceptiontable in CodeObject encoding
+ pub const FORMAT_VERSION: u32 = 5;

And make tail reads tolerant (treat missing tails as empty):

-    // Read linetable and exceptiontable
-    let linetable_len = rdr.read_u32()?;
-    let linetable = rdr.read_slice(linetable_len)?.to_vec().into_boxed_slice();
-
-    let exceptiontable_len = rdr.read_u32()?;
-    let exceptiontable = rdr.read_slice(exceptiontable_len)?.to_vec().into_boxed_slice();
+    // Read linetable and exceptiontable (backward‑compatible: treat missing tails as empty)
+    let (linetable, exceptiontable) = match rdr.read_u32() {
+        Ok(lt_len) => {
+            let lt = rdr.read_slice(lt_len)?.to_vec().into_boxed_slice();
+            let et_len = rdr.read_u32()?; // must exist if linetable_len existed
+            let et = rdr.read_slice(et_len)?.to_vec().into_boxed_slice();
+            (lt, et)
+        }
+        Err(MarshalError::Eof) => (Box::new([]), Box::new([])),
+        Err(e) => return Err(e),
+    };

Precise pointers:

  • Change constant in compiler/core/src/marshal.rs (line ~7).
  • Make the deserializer change where CodeObject tails are read (apply to the read sites noted previously: lines ~254–260, ~279–281, ~697–700 in compiler/core/src/marshal.rs).
  • Update any mirrors of FORMAT_VERSION: vm/src/stdlib/marshal.rs (exports version), compiler/core/src/frozen.rs and any frozen/frozen-encoding paths (derive-impl/src/compile_bytecode.rs uses FrozenCodeObject::encode), and other places that rely on / emit FORMAT_VERSION.

Follow‑ups:

  • Update all places that mirror/emit FORMAT_VERSION to 5.
  • Add unit tests that round‑trip both v4 (no tails) and v5 (with tails).
♻️ Duplicate comments (1)
vm/src/builtins/code.rs (1)

759-767: Don’t silently ignore co_code; reject until parsing is implemented.

This still returns the original instructions when a user supplies co_code, which is surprising. Emit a clear NotImplemented error instead.

Apply:

-        let instructions = match args.co_code {
-            OptionalArg::Present(_code_bytes) => {
-                // Convert bytes back to instructions
-                // For now, keep the original instructions
-                // TODO: Properly parse bytecode from bytes
-                self.code.instructions.clone()
-            }
-            OptionalArg::Missing => self.code.instructions.clone(),
-        };
+        let instructions = match args.co_code {
+            OptionalArg::Present(_code_bytes) => {
+                return Err(vm.new_not_implemented_error(
+                    "co_code replacement is not supported yet".to_string(),
+                ));
+            }
+            OptionalArg::Missing => self.code.instructions.clone(),
+        };
🧹 Nitpick comments (2)
vm/src/builtins/code.rs (2)

42-46: NO_LINE detection should use decoded kind, not raw header bits.

Comparing (byte >> 3) == 0x1f relies on the top bit being set and is indirect. Use the already‑decoded code/kind.

Apply:

-    /// Check if this is a NO_LINE marker (code 15)
-    fn is_no_line_marker(byte: u8) -> bool {
-        (byte >> 3) == 0x1f
-    }
+    /// Check if this is a NO_LINE marker (kind == None)
+    fn is_no_line_marker(code: u8) -> bool {
+        matches!(PyCodeLocationInfoKind::from_code(code), Some(PyCodeLocationInfoKind::None))
+    }
...
-        // Check for NO_LINE marker
-        if Self::is_no_line_marker(first_byte) {
+        // Check for NO_LINE marker
+        if Self::is_no_line_marker(code) {
             self.ar_line = -1;
         } else {
             self.ar_line = self.computed_line;
         }

Also applies to: 62-77


82-95: Skip loop is unnecessary and brittle; rely on kind‑specific reads.

get_line_delta already consumes the payload for the entry. Scanning until the next 0x80 risks desync if future payloads legally contain 0x80 bytes.

Apply:

-        // Skip remaining bytes for this entry
-        while !self.reader.at_end() {
-            if let Some(b) = self.reader.peek_byte() {
-                if (b & 0x80) != 0 {
-                    break;
-                }
-                self.reader.read_byte();
-            } else {
-                break;
-            }
-        }
+        // Payload fully consumed by get_line_delta()
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a22899f and 7f683ca.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_code.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • compiler/codegen/src/ir.rs (5 hunks)
  • compiler/core/src/bytecode.rs (4 hunks)
  • compiler/core/src/marshal.rs (3 hunks)
  • vm/src/builtins/code.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • compiler/core/src/bytecode.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • compiler/core/src/marshal.rs
  • compiler/codegen/src/ir.rs
  • vm/src/builtins/code.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • vm/src/builtins/code.rs
🧬 Code graph analysis (2)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
  • new (32-40)
  • new (338-340)
  • new (867-869)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
  • from_code (61-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
compiler/codegen/src/ir.rs (1)

203-205: End‑to‑end linetable generation wired correctly.

Generation and plumbing into CodeObject look sound; exception table left empty with a clear TODO. Nice.

When you later add exception table generation, ensure the marshal tests still round‑trip by adding size > 0 fixtures.

Also applies to: 225-227

vm/src/builtins/code.rs (1)

486-497: Linetable/positions exposure looks correct.

Accessors for co_linetable/co_exceptiontable and decoding in co_lines/co_positions align with the encoder. Good.

Please add a small round‑trip test: compile a snippet with columns ≥128, assert OneLine encoding (not Long), and verify co_positions matches expected columns.

Also applies to: 498-569, 570-689

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
compiler/codegen/src/ir.rs (1)

398-502: One‑line encoding wrongly restricted to columns < 128; allow full 0–255.

CPython’s OneLine{0,1,2} forms use raw u8 columns. Limiting to < 128 forces Long, bloating tables and diverging from CPython.

-                } else if col < 128 && end_col < 128 {
+                } else if col >= 0 && col <= 255 && end_col >= 0 && end_col <= 255 {
@@
-                if col < 128 && end_col < 128 {
+                if col >= 0 && col <= 255 && end_col >= 0 && end_col <= 255 {

Also consider emitting NoColumns when column is unavailable in future changes rather than always going Long.

vm/src/builtins/code.rs (1)

759-767: Don’t silently ignore co_code; reject until parsing is implemented.

Returning the original instructions surprises users. Prefer a clear error.

-            OptionalArg::Present(_code_bytes) => {
-                // Convert bytes back to instructions
-                // For now, keep the original instructions
-                // TODO: Properly parse bytecode from bytes
-                self.code.instructions.clone()
-            }
+            OptionalArg::Present(_code_bytes) => {
+                return Err(vm.new_not_implemented_error(
+                    "co_code replacement is not supported yet".to_string()
+                ));
+            }
🧹 Nitpick comments (9)
compiler/core/src/bytecode.rs (1)

36-58: Enum shape looks good; small naming nit on the “no line” variant.

Consider renaming None to NoLine to avoid repeated Option::None disambiguation and improve readability wherever both appear together. Non‑blocking.

compiler/codegen/src/ir.rs (4)

75-76: Consider capturing end locations sooner.

Tracking end_location now will let you emit column ranges (and multi‑line spans) later without reshaping the pipeline.


203-205: Use safe first-line conversion.

first_line_number.get() as i32 can silently truncate on very large inputs. Prefer i32::try_from(...).unwrap_or(i32::MAX) or clamp.

-        let linetable = generate_linetable(&locations, first_line_number.get() as i32);
+        let first = i32::try_from(first_line_number.get()).unwrap_or(i32::MAX);
+        let linetable = generate_linetable(&locations, first);

225-227: Empty exception table placeholder — gate or follow‑up issue.

Returning an empty table is fine temporarily; please either feature‑gate exposure or open a tracked issue to avoid shipping a partially inert API.


504-527: Varint writers match the reader and CPython’s scheme; consider reuse.

Encoding looks correct (6‑bit chunks, signed mapping). If feasible, place these helpers in a shared module to avoid drift with the VM’s reader.

vm/src/builtins/code.rs (4)

42-96: Minor: simplify NO_LINE detection.

Instead of (byte >> 3) == 0x1f, use the already-extracted code and compare to PyCodeLocationInfoKind::None for clarity.

-    fn is_no_line_marker(byte: u8) -> bool {
-        (byte >> 3) == 0x1f
-    }
+    fn is_no_line_marker(code: u8) -> bool {
+        matches!(PyCodeLocationInfoKind::from_code(code), Some(PyCodeLocationInfoKind::None))
+    }
@@
-        if Self::is_no_line_marker(first_byte) {
+        if Self::is_no_line_marker(code) {

486-496: co_linetable/co_exceptiontable OK — zero‑copy not required but fine.

Returning a copy is acceptable; future optimization could expose a memoryview.


786-795: co_nlocals validation guard is good; mirror CPython error text.

Message is clear; consider matching CPython’s exact wording if compatibility is a goal.


860-921: Reader mirrors writer; keep implementations in sync.

read_varint/read_signed_varint match the encoder. Consider centralizing with the encoder to avoid divergence.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f683ca and 56019c3.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_code.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • compiler/codegen/src/ir.rs (5 hunks)
  • compiler/core/src/bytecode.rs (4 hunks)
  • compiler/core/src/marshal.rs (3 hunks)
  • vm/src/builtins/code.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • compiler/core/src/marshal.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • compiler/core/src/bytecode.rs
  • compiler/codegen/src/ir.rs
  • vm/src/builtins/code.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • vm/src/builtins/code.rs
🧬 Code graph analysis (2)
compiler/codegen/src/ir.rs (2)
vm/src/builtins/code.rs (3)
  • new (32-40)
  • new (338-340)
  • new (867-869)
compiler/codegen/src/compile.rs (3)
  • new (310-317)
  • new (338-380)
  • line (5737-5737)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
  • from_code (61-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (12)
compiler/core/src/bytecode.rs (3)

60-103: Helper API covers current decoding needs.

from_code/is_short/short_column_group/one_line_delta align with CPython 3.11+ table semantics. LGTM.


218-222: New public fields on CodeObject — confirm all constructors/serde paths set them.

The additions look right; please ensure every constructor (including frozen/marshal decode paths) initializes both, and that default/empty cases use Box::new([]) rather than vec![] to avoid extra heap later.


1278-1280: Propagation in map_bag/map_clone_bag is correct.

Linetable/exceptiontable are forwarded/cloned as expected. LGTM.

Also applies to: 1311-1312

compiler/codegen/src/ir.rs (3)

8-9: Import of PyCodeLocationInfoKind is expected.

Matches usage below. LGTM.


398-502: Short-form grouping cap is correct; keep header math consistent.

Using (entry_length - 1) for the low 3 bits and chunking by 8 matches CPython’s format. LGTM.


441-449: Add unit test for OneLine forms with columns ≥128 after widening to 0–255.
Current code still uses col < 128 && end_col < 128 (compiler/codegen/src/ir.rs:441–449, 466–471). After changing OneLine encoding to accept 0–255, add a unit test that encodes col=end_col=200 and asserts a round‑trip via co_positions().

vm/src/builtins/code.rs (6)

19-19: Import is appropriate.

Matches decoding paths below. LGTM.


22-41: Address-range iterator scaffolding looks good.

State mirrors CPython’s iteration model. LGTM.


97-136: Decoding consumption aligns with encoder.

Long/NoColumns/OneLine/Short payload consumption matches what generate_linetable writes. LGTM.


498-569: co_lines semantics mostly match CPython; offsets in code units → bytes.

Using length * 2 to advance addresses aligns with 2‑byte code units. Merge logic for consecutive same‑line ranges looks correct. LGTM.


570-689: co_positions decoding matches the table; verify None propagation.

Handling of PyCodeLocationInfoKind::None (all Nones) and one‑line forms is consistent. Good.

Please add a small test asserting that a NO_LINE entry yields (None, None, None, None) for each covered instruction.


797-808: Replacement wiring looks consistent.

Qualname, stacksize, cell/free vars, linetable/exceptiontable all propagate correctly. LGTM.

Also applies to: 820-837

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
compiler/codegen/src/ir.rs (1)

441-449: One‑line encoding must allow full 0–255 column range (not <128).

CPython’s OneLine{0,1,2} store start/end columns as raw u8. Restricting to < 128 bloats tables and diverges from CPython.

Apply:

-                } else if col < 128 && end_col < 128 {
+                } else if (0..=255).contains(&col) && (0..=255).contains(&end_col) {
...
-                if col < 128 && end_col < 128 {
+                if (0..=255).contains(&col) && (0..=255).contains(&end_col) {

Also applies to: 466-471

vm/src/builtins/code.rs (1)

759-767: Don’t silently ignore user‑provided co_code. Return a clear error until parsing is implemented.

Silently reusing existing instructions is surprising and user‑visible.

-        let instructions = match args.co_code {
-            OptionalArg::Present(_code_bytes) => {
-                // Convert bytes back to instructions
-                // For now, keep the original instructions
-                // TODO: Properly parse bytecode from bytes
-                self.code.instructions.clone()
-            }
-            OptionalArg::Missing => self.code.instructions.clone(),
-        };
+        let instructions = match args.co_code {
+            OptionalArg::Present(_code_bytes) => {
+                return Err(vm.new_not_implemented_error(
+                    "co_code replacement is not yet supported".to_string(),
+                ));
+            }
+            OptionalArg::Missing => self.code.instructions.clone(),
+        };
🧹 Nitpick comments (7)
compiler/codegen/src/ir.rs (3)

398-405: Consider emitting NoColumns/None when appropriate to match CPython semantics.

Current encoder never uses NoColumns (13) or None/NO_LINE (15). If any instruction has “no position” or unknown columns, encoding those forms improves fidelity and downstream iteration. Not blocking, but worth tracking.


225-227: Exception table placeholder.

Leaving exceptiontable empty is fine short‑term; please gate consumers accordingly and add a TODO reference/issue so this isn’t forgotten.

I can open an issue with a checklist and CPython references if helpful.


506-514: Varint helpers: mirror reader/writer invariants.

Writers look correct; add brief asserts to guarantee payload bytes never set bit7 (MSB) to prevent accidental header confusion if code changes.

 fn write_varint(buf: &mut Vec<u8>, mut val: u32) -> usize {
     let start_len = buf.len();
     while val >= 64 {
-        buf.push(0x40 | (val & 0x3f) as u8);
+        let b = 0x40 | (val & 0x3f) as u8;
+        debug_assert_eq!(b & 0x80, 0);
+        buf.push(b);
         val >>= 6;
     }
-    buf.push(val as u8);
+    let b = val as u8;
+    debug_assert_eq!(b & 0x80, 0);
+    buf.push(b);
     buf.len() - start_len
 }

Also applies to: 518-527

vm/src/builtins/code.rs (4)

796-807: Validate replaced tables.

When co_linetable/co_exceptiontable are provided, decoding should succeed and cover the instruction stream. Consider verifying that linetable iteration spans the full code length (sum of lengths equals instructions.len()*2) and erroring otherwise.


42-46: Make NO_LINE detection explicit.

(byte >> 3) == 0x1f works but is opaque. Prefer extracting code = (first_byte >> 3) & 0x0f and comparing to 15 for readability.

-    fn is_no_line_marker(byte: u8) -> bool {
-        (byte >> 3) == 0x1f
-    }
+    fn is_no_line_marker(byte: u8) -> bool {
+        ((byte >> 3) & 0x0f) == 0x0f
+    }

Also applies to: 62-64


498-569: co_lines builds a full Vec; consider lazy iterator.

Non‑blocking. A generator/iterator like CPython’s lineiterator avoids allocating for large functions.


889-917: Reader matches encoder; add minimal sanity checks.

As a guard against malformed inputs, add limits to varint loop iterations (e.g., cap at 6 chunks => 36 bits) and early‑exit on EOF to avoid O(n²) in degenerate data.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56019c3 and 1d63075.

⛔ Files ignored due to path filters (2)
  • Lib/test/test_code.py is excluded by !Lib/**
  • Lib/test/test_compile.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • compiler/codegen/src/ir.rs (5 hunks)
  • compiler/core/src/bytecode.rs (4 hunks)
  • compiler/core/src/marshal.rs (3 hunks)
  • vm/src/builtins/code.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • compiler/core/src/bytecode.rs
  • compiler/core/src/marshal.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • vm/src/builtins/code.rs
  • compiler/codegen/src/ir.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • vm/src/builtins/code.rs
🧬 Code graph analysis (2)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
  • from_code (61-81)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
  • new (32-40)
  • new (338-340)
  • new (867-869)

@youknowone youknowone merged commit b666c52 into RustPython:main Sep 16, 2025
12 checks passed
@youknowone youknowone deleted the code-line-table branch September 16, 2025 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant