-
Notifications
You must be signed in to change notification settings - Fork 1.3k
code object linetable #6150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code object linetable #6150
Conversation
WalkthroughAdds CPython 3.11+ linetable and exception table support across the compiler, core, marshal, and VM: codegen collects per-instruction locations and encodes a linetable, core bytecode types gain linetable/exceptiontable fields and marshal persists them, and the VM decodes and exposes these tables via new PyCode APIs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Frontend as Compiler Frontend
participant IR as IR Builder
participant Codegen as Codegen (ir.rs)
participant Core as CodeObject (core/bytecode.rs)
participant Marshal as Marshal
participant VM as VM PyCode (builtins/code.rs)
Frontend->>IR: Emit instructions with source locations
IR->>Codegen: finalize_code(locations, instrs)
Codegen->>Codegen: generate_linetable(locations) -> Box<[u8]>
Codegen->>Core: Construct CodeObject{..., linetable, exceptiontable}
Core->>Marshal: serialize_code includes linetable & exceptiontable
Marshal->>Core: deserialize_code reads linetable & exceptiontable
Core->>VM: Load CodeObject into PyCode (contains linetable)
VM->>VM: LineTableReader parses linetable, yields positions/ranges
VM-->>Caller: co_linetable()/co_exceptiontable() and co_lines()/co_positions()
sequenceDiagram
autonumber
participant PyCode as PyCode.co_lines()
participant Reader as LineTableReader
participant Kind as PyCodeLocationInfoKind
PyCode->>Reader: init with linetable bytes
loop per encoded entry
Reader->>Reader: read varint / signed varint
Reader->>Kind: decode entry type (Short/OneLine/NoColumns/Long/None)
Kind-->>Reader: provide delta semantics (line/col/no-line)
Reader-->>PyCode: emit (start_off, end_off, lineno) or position tuple
end
PyCode-->>Caller: iterator of merged ranges or positions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
compiler/core/src/bytecode.rs (2)
36-58
: Enum naming nit: considerNoLine
overNone
.
None
reads likeOption::None
at use sites;NoLine
matches CPython docs and avoids ambiguity.
218-221
: New fields: document invariants.Add a brief doc that
linetable
/exceptiontable
are CPython 3.11+ raw blobs (co_linetable/co_exceptiontable) and may be empty when unavailable.compiler/codegen/src/ir.rs (1)
225-227
: Exception table placeholder.Leaving
exceptiontable
empty is fine short‑term; add a TODO pointing to the planned encoder to avoid shipping a silent stub.vm/src/builtins/code.rs (4)
47-96
: Advance logic: robust to malformed entries.Skips payload bytes until next header (bit7 set). Consider logging at debug level when encountering invalid bytes to aid diagnostics.
160-176
: replace(): new args are reasonable; validate inputs.Nice to add
co_nlocals
consistency check. Consider validatingco_firstlineno=0
maps to None (already handled) and rejecting negative via API.
498-569
: co_lines(): eager list is fine; iterator later.Merges adjacent ranges with same line and maps NO_LINE to None. Consider yielding a lazy iterator in a follow‑up for large functions.
860-921
: Duplicated linetable parsing logic; consider sharing.
LineTableReader
duplicates varint/signed‑varint logic from codegen. Extract a small shared helper/module to avoid drift.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_code.py
is excluded by!Lib/**
📒 Files selected for processing (4)
compiler/codegen/src/ir.rs
(5 hunks)compiler/core/src/bytecode.rs
(4 hunks)compiler/core/src/marshal.rs
(1 hunks)vm/src/builtins/code.rs
(7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs
: Format Rust code with the default rustfmt style (runcargo fmt
)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
compiler/core/src/marshal.rs
compiler/core/src/bytecode.rs
vm/src/builtins/code.rs
compiler/codegen/src/ir.rs
{vm,stdlib}/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use RustPython macros (
pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/builtins/code.rs
🧬 Code graph analysis (3)
compiler/core/src/marshal.rs (1)
vm/src/builtins/code.rs (3)
new
(32-40)new
(338-340)new
(867-869)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
from_code
(61-81)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
new
(32-40)new
(338-340)new
(867-869)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (9)
compiler/core/src/bytecode.rs (2)
60-103
: Mapping looks correct; add unit tests for encodings.Please add small tests asserting
from_code
/helpers match CPython’s 0–15 meanings (short groups, one‑line deltas, no‑columns, long, no‑line).
1269-1280
: Propagation LGTM.
map_bag
/map_clone_bag
correctly carry the new tables.Also applies to: 1301-1312
compiler/codegen/src/ir.rs (3)
203-205
: Good: linetable generation wired into CodeObject.The generator is called after finalizing
locations
; placement is correct.
398-505
: Encoding aligns with CPython 3.11+ format; add edge‑case tests.Looks correct (short groups, one‑line deltas, long form, varints). Please add tests for:
- Large columns (≥128) use long form.
- Negative/large line deltas.
- Runs spanning ExtendedArg (length in code units).
506-529
: Varint/ZigZag helpers LGTM.Matches 6‑bit continuation and ZigZag sign handling.
vm/src/builtins/code.rs (4)
22-41
: Reader/state structure looks sound.
PyCodeAddressRange
and initialization mirror CPython’s iteration; sentinel handling for NO_LINE is clear.
97-136
: Delta decoding matches encoding.Covers LONG/NO_COLUMNS/ONE_LINE/SHORT; NO_LINE handled via caller. Good.
486-497
: Accessors LGTM.Expose raw tables as bytes; copies are acceptable given size.
570-689
: co_positions(): per‑instruction tuples match CPython semantics.Decoding for LONG/NO_COLUMNS/ONE_LINE/SHORT looks correct. Add tests covering all kinds, including NO_LINE spans.
compiler/core/src/marshal.rs
Outdated
linetable: Box::new([]), // TODO: Read from marshal format | ||
exceptiontable: Box::new([]), // TODO: Read from marshal format | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Marshal drops linetable/exceptiontable (lossy round‑trip).
Deserialization initializes both tables to empty, and serialization doesn’t write them, so any compiled code marshaled/unmarshaled loses debug data.
- Either: keep format v4 lossy but clearly document it and gate uses that rely on line info.
- Or: extend the format to include
linetable
andexceptiontable
(likely bump format; ensure backward compatibility when reading older blobs).
I can draft a backward‑compatible read/write plan if you confirm where FORMAT_VERSION
is emitted/checked.
🏁 Script executed:
#!/bin/bash
# Find where marshal format version is written/read and who consumes it.
rg -n "FORMAT_VERSION|serialize_code\\(|deserialize_code\\(|FrozenCodeObject|frozen" -C3
Length of output: 61799
Marshal drops linetable/exceptiontable — preserve debug data or document lossiness (critical).
File: compiler/core/src/marshal.rs
Lines: 272-274
linetable: Box::new([]), // TODO: Read from marshal format
exceptiontable: Box::new([]), // TODO: Read from marshal format
})
Deserialization initializes both tables to empty and serialization doesn’t write them — any serialized/deserialized CodeObject (including frozen modules via compiler/core/src/frozen.rs) loses line/exception debug info.
- Evidence: FORMAT_VERSION is declared at compiler/core/src/marshal.rs:7; marshal::serialize_code/deserialize_code are used by compiler/core/src/frozen.rs (encode/decode) and frozen generation paths.
- Action (pick one):
- Keep v4 lossy but document it and gate code that relies on line/exception tables.
- Extend the marshal format to include linetable & exceptiontable, bump FORMAT_VERSION, implement a backward‑compatible reader that treats missing tables as empty, and update serialize_code (and frozen encode paths) to write the new fields.
- Files to change: compiler/core/src/marshal.rs (serialize/deserialize), compiler/core/src/frozen.rs (encode/decode), update FORMAT_VERSION at compiler/core/src/marshal.rs:7 and vm/src/stdlib/marshal.rs if exposing a new version.
I can draft a backward‑compatible read/write plan now that FORMAT_VERSION location is confirmed.
🤖 Prompt for AI Agents
In compiler/core/src/marshal.rs around lines 272-274, deserialization currently
sets linetable and exceptiontable to empty which strips debug info; implement
the non-lossy fix: extend the marshal format to include linetable and
exceptiontable, bump FORMAT_VERSION in compiler/core/src/marshal.rs (and mirror
the new version in vm/src/stdlib/marshal.rs), update serialize_code to write the
new fields, update deserialize_code to read them while remaining
backward-compatible (treat missing fields from older versions as empty), and
update compiler/core/src/frozen.rs encode/decode to emit/consume the new fields
so frozen modules preserve line/exception tables.
let instructions = match args.co_code { | ||
OptionalArg::Present(_code_bytes) => { | ||
// Convert bytes back to instructions | ||
// For now, keep the original instructions | ||
// TODO: Properly parse bytecode from bytes | ||
self.code.instructions.clone() | ||
} | ||
OptionalArg::Missing => self.code.instructions.clone(), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co_code parsing TODO leaves API partially inert.
When users pass co_code
, it’s ignored. Either parse now or reject with a clear error to avoid surprising behavior.
Apply one of:
- OptionalArg::Present(_code_bytes) => {
- // Convert bytes back to instructions
- // For now, keep the original instructions
- // TODO: Properly parse bytecode from bytes
- self.code.instructions.clone()
- }
+ OptionalArg::Present(_code_bytes) => {
+ return Err(vm.new_not_implemented_error("co_code replacement is not yet supported".to_string()));
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
let instructions = match args.co_code { | |
OptionalArg::Present(_code_bytes) => { | |
// Convert bytes back to instructions | |
// For now, keep the original instructions | |
// TODO: Properly parse bytecode from bytes | |
self.code.instructions.clone() | |
} | |
OptionalArg::Missing => self.code.instructions.clone(), | |
}; | |
let instructions = match args.co_code { | |
OptionalArg::Present(_code_bytes) => { | |
return Err(vm.new_not_implemented_error("co_code replacement is not yet supported".to_string())); | |
} | |
OptionalArg::Missing => self.code.instructions.clone(), | |
}; |
🤖 Prompt for AI Agents
In vm/src/builtins/code.rs around lines 759-767, the code currently ignores
OptionalArg::Present(_code_bytes) for co_code; either implement parsing or
explicitly reject input — update the match so that OptionalArg::Present(_)
returns an immediate error (with a clear message like "co_code is not supported
yet; parsing bytecode is not implemented") instead of silently using
self.code.instructions.clone(), and ensure the function returns the appropriate
error type per surrounding code paths and add/update a unit test asserting that
passing co_code yields the expected error.
a22899f
to
7f683ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
compiler/core/src/marshal.rs (1)
7-7
: Marshal format changed without version bump — bump to v5 and make deserialization tolerant.You added linetable/exceptiontable to CodeObject encoding but left FORMAT_VERSION = 4 and the deserializer reads the new tails unconditionally; old v4 blobs will EOF here and older readers can’t distinguish v5. Bump the format and make tail reads backward‑compatible.
Apply:
- pub const FORMAT_VERSION: u32 = 4; + // v5: includes linetable & exceptiontable in CodeObject encoding + pub const FORMAT_VERSION: u32 = 5;And make tail reads tolerant (treat missing tails as empty):
- // Read linetable and exceptiontable - let linetable_len = rdr.read_u32()?; - let linetable = rdr.read_slice(linetable_len)?.to_vec().into_boxed_slice(); - - let exceptiontable_len = rdr.read_u32()?; - let exceptiontable = rdr.read_slice(exceptiontable_len)?.to_vec().into_boxed_slice(); + // Read linetable and exceptiontable (backward‑compatible: treat missing tails as empty) + let (linetable, exceptiontable) = match rdr.read_u32() { + Ok(lt_len) => { + let lt = rdr.read_slice(lt_len)?.to_vec().into_boxed_slice(); + let et_len = rdr.read_u32()?; // must exist if linetable_len existed + let et = rdr.read_slice(et_len)?.to_vec().into_boxed_slice(); + (lt, et) + } + Err(MarshalError::Eof) => (Box::new([]), Box::new([])), + Err(e) => return Err(e), + };Precise pointers:
- Change constant in compiler/core/src/marshal.rs (line ~7).
- Make the deserializer change where CodeObject tails are read (apply to the read sites noted previously: lines ~254–260, ~279–281, ~697–700 in compiler/core/src/marshal.rs).
- Update any mirrors of FORMAT_VERSION: vm/src/stdlib/marshal.rs (exports version), compiler/core/src/frozen.rs and any frozen/frozen-encoding paths (derive-impl/src/compile_bytecode.rs uses FrozenCodeObject::encode), and other places that rely on / emit FORMAT_VERSION.
Follow‑ups:
- Update all places that mirror/emit FORMAT_VERSION to 5.
- Add unit tests that round‑trip both v4 (no tails) and v5 (with tails).
♻️ Duplicate comments (1)
vm/src/builtins/code.rs (1)
759-767
: Don’t silently ignore co_code; reject until parsing is implemented.This still returns the original instructions when a user supplies co_code, which is surprising. Emit a clear NotImplemented error instead.
Apply:
- let instructions = match args.co_code { - OptionalArg::Present(_code_bytes) => { - // Convert bytes back to instructions - // For now, keep the original instructions - // TODO: Properly parse bytecode from bytes - self.code.instructions.clone() - } - OptionalArg::Missing => self.code.instructions.clone(), - }; + let instructions = match args.co_code { + OptionalArg::Present(_code_bytes) => { + return Err(vm.new_not_implemented_error( + "co_code replacement is not supported yet".to_string(), + )); + } + OptionalArg::Missing => self.code.instructions.clone(), + };
🧹 Nitpick comments (2)
vm/src/builtins/code.rs (2)
42-46
: NO_LINE detection should use decoded kind, not raw header bits.Comparing
(byte >> 3) == 0x1f
relies on the top bit being set and is indirect. Use the already‑decodedcode
/kind.Apply:
- /// Check if this is a NO_LINE marker (code 15) - fn is_no_line_marker(byte: u8) -> bool { - (byte >> 3) == 0x1f - } + /// Check if this is a NO_LINE marker (kind == None) + fn is_no_line_marker(code: u8) -> bool { + matches!(PyCodeLocationInfoKind::from_code(code), Some(PyCodeLocationInfoKind::None)) + } ... - // Check for NO_LINE marker - if Self::is_no_line_marker(first_byte) { + // Check for NO_LINE marker + if Self::is_no_line_marker(code) { self.ar_line = -1; } else { self.ar_line = self.computed_line; }Also applies to: 62-77
82-95
: Skip loop is unnecessary and brittle; rely on kind‑specific reads.
get_line_delta
already consumes the payload for the entry. Scanning until the next 0x80 risks desync if future payloads legally contain 0x80 bytes.Apply:
- // Skip remaining bytes for this entry - while !self.reader.at_end() { - if let Some(b) = self.reader.peek_byte() { - if (b & 0x80) != 0 { - break; - } - self.reader.read_byte(); - } else { - break; - } - } + // Payload fully consumed by get_line_delta()
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_code.py
is excluded by!Lib/**
📒 Files selected for processing (4)
compiler/codegen/src/ir.rs
(5 hunks)compiler/core/src/bytecode.rs
(4 hunks)compiler/core/src/marshal.rs
(3 hunks)vm/src/builtins/code.rs
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- compiler/core/src/bytecode.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs
: Format Rust code with the default rustfmt style (runcargo fmt
)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
compiler/core/src/marshal.rs
compiler/codegen/src/ir.rs
vm/src/builtins/code.rs
{vm,stdlib}/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use RustPython macros (
pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/builtins/code.rs
🧬 Code graph analysis (2)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
new
(32-40)new
(338-340)new
(867-869)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
from_code
(61-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
compiler/codegen/src/ir.rs (1)
203-205
: End‑to‑end linetable generation wired correctly.Generation and plumbing into CodeObject look sound; exception table left empty with a clear TODO. Nice.
When you later add exception table generation, ensure the marshal tests still round‑trip by adding size > 0 fixtures.
Also applies to: 225-227
vm/src/builtins/code.rs (1)
486-497
: Linetable/positions exposure looks correct.Accessors for co_linetable/co_exceptiontable and decoding in co_lines/co_positions align with the encoder. Good.
Please add a small round‑trip test: compile a snippet with columns ≥128, assert OneLine encoding (not Long), and verify co_positions matches expected columns.
Also applies to: 498-569, 570-689
7f683ca
to
56019c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
compiler/codegen/src/ir.rs (1)
398-502
: One‑line encoding wrongly restricted to columns < 128; allow full 0–255.CPython’s OneLine{0,1,2} forms use raw u8 columns. Limiting to
< 128
forces Long, bloating tables and diverging from CPython.- } else if col < 128 && end_col < 128 { + } else if col >= 0 && col <= 255 && end_col >= 0 && end_col <= 255 { @@ - if col < 128 && end_col < 128 { + if col >= 0 && col <= 255 && end_col >= 0 && end_col <= 255 {Also consider emitting
NoColumns
when column is unavailable in future changes rather than always going Long.vm/src/builtins/code.rs (1)
759-767
: Don’t silently ignore co_code; reject until parsing is implemented.Returning the original instructions surprises users. Prefer a clear error.
- OptionalArg::Present(_code_bytes) => { - // Convert bytes back to instructions - // For now, keep the original instructions - // TODO: Properly parse bytecode from bytes - self.code.instructions.clone() - } + OptionalArg::Present(_code_bytes) => { + return Err(vm.new_not_implemented_error( + "co_code replacement is not supported yet".to_string() + )); + }
🧹 Nitpick comments (9)
compiler/core/src/bytecode.rs (1)
36-58
: Enum shape looks good; small naming nit on the “no line” variant.Consider renaming
None
toNoLine
to avoid repeatedOption::None
disambiguation and improve readability wherever both appear together. Non‑blocking.compiler/codegen/src/ir.rs (4)
75-76
: Consider capturing end locations sooner.Tracking
end_location
now will let you emit column ranges (and multi‑line spans) later without reshaping the pipeline.
203-205
: Use safe first-line conversion.
first_line_number.get() as i32
can silently truncate on very large inputs. Preferi32::try_from(...).unwrap_or(i32::MAX)
or clamp.- let linetable = generate_linetable(&locations, first_line_number.get() as i32); + let first = i32::try_from(first_line_number.get()).unwrap_or(i32::MAX); + let linetable = generate_linetable(&locations, first);
225-227
: Empty exception table placeholder — gate or follow‑up issue.Returning an empty table is fine temporarily; please either feature‑gate exposure or open a tracked issue to avoid shipping a partially inert API.
504-527
: Varint writers match the reader and CPython’s scheme; consider reuse.Encoding looks correct (6‑bit chunks, signed mapping). If feasible, place these helpers in a shared module to avoid drift with the VM’s reader.
vm/src/builtins/code.rs (4)
42-96
: Minor: simplify NO_LINE detection.Instead of
(byte >> 3) == 0x1f
, use the already-extractedcode
and compare toPyCodeLocationInfoKind::None
for clarity.- fn is_no_line_marker(byte: u8) -> bool { - (byte >> 3) == 0x1f - } + fn is_no_line_marker(code: u8) -> bool { + matches!(PyCodeLocationInfoKind::from_code(code), Some(PyCodeLocationInfoKind::None)) + } @@ - if Self::is_no_line_marker(first_byte) { + if Self::is_no_line_marker(code) {
486-496
: co_linetable/co_exceptiontable OK — zero‑copy not required but fine.Returning a copy is acceptable; future optimization could expose a memoryview.
786-795
: co_nlocals validation guard is good; mirror CPython error text.Message is clear; consider matching CPython’s exact wording if compatibility is a goal.
860-921
: Reader mirrors writer; keep implementations in sync.
read_varint/read_signed_varint
match the encoder. Consider centralizing with the encoder to avoid divergence.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_code.py
is excluded by!Lib/**
📒 Files selected for processing (4)
compiler/codegen/src/ir.rs
(5 hunks)compiler/core/src/bytecode.rs
(4 hunks)compiler/core/src/marshal.rs
(3 hunks)vm/src/builtins/code.rs
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- compiler/core/src/marshal.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs
: Format Rust code with the default rustfmt style (runcargo fmt
)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
compiler/core/src/bytecode.rs
compiler/codegen/src/ir.rs
vm/src/builtins/code.rs
{vm,stdlib}/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use RustPython macros (
pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/builtins/code.rs
🧬 Code graph analysis (2)
compiler/codegen/src/ir.rs (2)
vm/src/builtins/code.rs (3)
new
(32-40)new
(338-340)new
(867-869)compiler/codegen/src/compile.rs (3)
new
(310-317)new
(338-380)line
(5737-5737)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
from_code
(61-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (12)
compiler/core/src/bytecode.rs (3)
60-103
: Helper API covers current decoding needs.
from_code/is_short/short_column_group/one_line_delta
align with CPython 3.11+ table semantics. LGTM.
218-222
: New public fields on CodeObject — confirm all constructors/serde paths set them.The additions look right; please ensure every constructor (including frozen/marshal decode paths) initializes both, and that default/empty cases use
Box::new([])
rather thanvec![]
to avoid extra heap later.
1278-1280
: Propagation in map_bag/map_clone_bag is correct.Linetable/exceptiontable are forwarded/cloned as expected. LGTM.
Also applies to: 1311-1312
compiler/codegen/src/ir.rs (3)
8-9
: Import of PyCodeLocationInfoKind is expected.Matches usage below. LGTM.
398-502
: Short-form grouping cap is correct; keep header math consistent.Using
(entry_length - 1)
for the low 3 bits and chunking by 8 matches CPython’s format. LGTM.
441-449
: Add unit test for OneLine forms with columns ≥128 after widening to 0–255.
Current code still uses col < 128 && end_col < 128 (compiler/codegen/src/ir.rs:441–449, 466–471). After changing OneLine encoding to accept 0–255, add a unit test that encodes col=end_col=200 and asserts a round‑trip via co_positions().vm/src/builtins/code.rs (6)
19-19
: Import is appropriate.Matches decoding paths below. LGTM.
22-41
: Address-range iterator scaffolding looks good.State mirrors CPython’s iteration model. LGTM.
97-136
: Decoding consumption aligns with encoder.Long/NoColumns/OneLine/Short payload consumption matches what
generate_linetable
writes. LGTM.
498-569
: co_lines semantics mostly match CPython; offsets in code units → bytes.Using
length * 2
to advance addresses aligns with 2‑byte code units. Merge logic for consecutive same‑line ranges looks correct. LGTM.
570-689
: co_positions decoding matches the table; verify None propagation.Handling of
PyCodeLocationInfoKind::None
(all Nones) and one‑line forms is consistent. Good.Please add a small test asserting that a NO_LINE entry yields
(None, None, None, None)
for each covered instruction.
797-808
: Replacement wiring looks consistent.Qualname, stacksize, cell/free vars, linetable/exceptiontable all propagate correctly. LGTM.
Also applies to: 820-837
56019c3
to
1d63075
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
compiler/codegen/src/ir.rs (1)
441-449
: One‑line encoding must allow full 0–255 column range (not <128).CPython’s OneLine{0,1,2} store start/end columns as raw u8. Restricting to
< 128
bloats tables and diverges from CPython.Apply:
- } else if col < 128 && end_col < 128 { + } else if (0..=255).contains(&col) && (0..=255).contains(&end_col) { ... - if col < 128 && end_col < 128 { + if (0..=255).contains(&col) && (0..=255).contains(&end_col) {Also applies to: 466-471
vm/src/builtins/code.rs (1)
759-767
: Don’t silently ignore user‑provided co_code. Return a clear error until parsing is implemented.Silently reusing existing instructions is surprising and user‑visible.
- let instructions = match args.co_code { - OptionalArg::Present(_code_bytes) => { - // Convert bytes back to instructions - // For now, keep the original instructions - // TODO: Properly parse bytecode from bytes - self.code.instructions.clone() - } - OptionalArg::Missing => self.code.instructions.clone(), - }; + let instructions = match args.co_code { + OptionalArg::Present(_code_bytes) => { + return Err(vm.new_not_implemented_error( + "co_code replacement is not yet supported".to_string(), + )); + } + OptionalArg::Missing => self.code.instructions.clone(), + };
🧹 Nitpick comments (7)
compiler/codegen/src/ir.rs (3)
398-405
: Consider emitting NoColumns/None when appropriate to match CPython semantics.Current encoder never uses NoColumns (13) or None/NO_LINE (15). If any instruction has “no position” or unknown columns, encoding those forms improves fidelity and downstream iteration. Not blocking, but worth tracking.
225-227
: Exception table placeholder.Leaving
exceptiontable
empty is fine short‑term; please gate consumers accordingly and add a TODO reference/issue so this isn’t forgotten.I can open an issue with a checklist and CPython references if helpful.
506-514
: Varint helpers: mirror reader/writer invariants.Writers look correct; add brief asserts to guarantee payload bytes never set bit7 (MSB) to prevent accidental header confusion if code changes.
fn write_varint(buf: &mut Vec<u8>, mut val: u32) -> usize { let start_len = buf.len(); while val >= 64 { - buf.push(0x40 | (val & 0x3f) as u8); + let b = 0x40 | (val & 0x3f) as u8; + debug_assert_eq!(b & 0x80, 0); + buf.push(b); val >>= 6; } - buf.push(val as u8); + let b = val as u8; + debug_assert_eq!(b & 0x80, 0); + buf.push(b); buf.len() - start_len }Also applies to: 518-527
vm/src/builtins/code.rs (4)
796-807
: Validate replaced tables.When
co_linetable
/co_exceptiontable
are provided, decoding should succeed and cover the instruction stream. Consider verifying that linetable iteration spans the full code length (sum of lengths equalsinstructions.len()*2
) and erroring otherwise.
42-46
: Make NO_LINE detection explicit.
(byte >> 3) == 0x1f
works but is opaque. Prefer extractingcode = (first_byte >> 3) & 0x0f
and comparing to15
for readability.- fn is_no_line_marker(byte: u8) -> bool { - (byte >> 3) == 0x1f - } + fn is_no_line_marker(byte: u8) -> bool { + ((byte >> 3) & 0x0f) == 0x0f + }Also applies to: 62-64
498-569
: co_lines builds a full Vec; consider lazy iterator.Non‑blocking. A generator/iterator like CPython’s
lineiterator
avoids allocating for large functions.
889-917
: Reader matches encoder; add minimal sanity checks.As a guard against malformed inputs, add limits to varint loop iterations (e.g., cap at 6 chunks => 36 bits) and early‑exit on EOF to avoid O(n²) in degenerate data.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Lib/test/test_code.py
is excluded by!Lib/**
Lib/test/test_compile.py
is excluded by!Lib/**
📒 Files selected for processing (4)
compiler/codegen/src/ir.rs
(5 hunks)compiler/core/src/bytecode.rs
(4 hunks)compiler/core/src/marshal.rs
(3 hunks)vm/src/builtins/code.rs
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- compiler/core/src/bytecode.rs
- compiler/core/src/marshal.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs
: Format Rust code with the default rustfmt style (runcargo fmt
)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
vm/src/builtins/code.rs
compiler/codegen/src/ir.rs
{vm,stdlib}/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use RustPython macros (
pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/builtins/code.rs
🧬 Code graph analysis (2)
vm/src/builtins/code.rs (1)
compiler/core/src/bytecode.rs (1)
from_code
(61-81)
compiler/codegen/src/ir.rs (1)
vm/src/builtins/code.rs (3)
new
(32-40)new
(338-340)new
(867-869)
Summary by CodeRabbit