-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add symtable.py
from 3.13.5
#6048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes update the symbol table module in the standard library. The module's registration key is renamed from "symtable" to "_symtable". The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we have test_symtable but no symtable.py? How did it work before?
Lib/test/test_symtable.py
Outdated
|
||
Alias = find_block(top, "Alias") | ||
GenericAlias = find_block(top, "GenericAlias") | ||
# TODO(RUSTPYTHON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO(RUSTPYTHON) | |
# XXX: RUSTPYTHON |
We don't have a strict rule for this, but conventionally using this to distinguish local patch from failing tests.
Lib/test/test_symtable.py
Outdated
# U = find_block(GenericMine, "U") | ||
|
||
|
||
# TODO(RUSTPYTHON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO(RUSTPYTHON) | |
# TODO: RUSTPYTHON |
The notation itself looks good.
But we use search tools for this comment. Keeping consistency will be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old habits die hard 😅
Correct
We had implemented |
Lib/test/test_symtable.py
Outdated
|
||
# TODO: RUSTPYTHON | ||
@unittest.expectedFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had test-passing symtable
before, can these tests be easily fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, I think the main issue is that "symbols" attribute used to return a list, now it's a dict. I'll investigate a bit more about the structure. For now I'll set the PR as a draft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now more tests pass. But we still have a difference between our impl and cpython. but this is a change that needs to be made at compiler/codegen/src/symboltable.rs
, I think it's a bit out of scope for this PR. I'll try to do a followup PR to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_symtable.py
is excluded by!Lib/**
📒 Files selected for processing (1)
vm/src/stdlib/symtable.rs
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs
: Follow the default rustfmt code style (cargo fmt
to format)
Always run clippy to lint code (cargo clippy
) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/stdlib/symtable.rs
🧠 Learnings (1)
vm/src/stdlib/symtable.rs (3)
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass
, pymodule
, pyfunction
, etc.) when implementing Python functionality in Rust
🔇 Additional comments (4)
vm/src/stdlib/symtable.rs (4)
5-11
: LGTM!The import additions are appropriate for the new functionality -
PyDictRef
supports thesymbols
property and the explicitcompiler
import improves code clarity.
13-105
: Excellent alignment with CPython's symtable constants.The comprehensive set of constants mirrors CPython's internal symtable definitions, ensuring compatibility. The bit-shifting patterns and flag combinations are mathematically correct and follow established conventions.
144-184
: Excellent refactoring to Python-style properties.The conversion from explicit methods to properties using
#[pygetset]
aligns perfectly with Python conventions. The use of#[pygetset(name = "type")]
properly handles Rust keyword conflicts.
186-194
: Clean implementation of symbols property.The new
symbols
property correctly creates a dictionary mapping symbol names to their flag bits, providing a proper Python-style interface to the symbol information.
#[pygetset] | ||
fn id(&self) -> usize { | ||
self as *const Self as *const std::ffi::c_void as usize | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider security implications of exposing memory addresses.
The id
property exposes raw memory addresses, which could potentially be used for memory layout attacks or information disclosure in security-sensitive contexts.
Consider whether this aligns with CPython's behavior and if there are security implications for your use case:
🌐 Web query:
What does Python's symtable SymbolTable.id property return and are there security considerations?
💡 Result:
The id
property (retrievable via get_id()
) of Python's symtable.SymbolTable
returns the table’s identifier, which is an internal value associated with that specific symbol table instance[1][2][5][6]. This identifier is primarily intended for internal use, such as distinguishing between different symbol table objects generated during code analysis.
Details:
-
What does
get_id()
return?- The value returned by
SymbolTable.get_id()
is an identifier for that block's symbol table instance. It does not return the name of the function, class, or module—that is provided byget_name()
. Instead, it gives an ID unique to the table, but published documentation does not specify if this is guaranteed to be globally unique, stable between runs, or even meaningful outside the context of the current symbol table introspection session[1][2][5][6]. - Source-level guides and sample code (see [6]) show the use of
get_id()
alongside type and name to help programmers identify and differentiate symbol tables while traversing nested scopes.
- The value returned by
-
Security considerations:
- There is no evidence in official documentation that the
id
or the value returned byget_id()
contains sensitive information such as memory addresses, code contents, or pointers to user data[1][2][5][6]. - However, since this is an internal identifier of a compiler structure, it is conceivable (though not specified) that in some Python implementations or versions, this ID could reveal patterns such as traversal order, or, hypothetically, even leak implementation details if misused.
- In regular usage (introspecting Python source structures for tooling or analysis), exposure of the symbol table ID should not represent a direct security vulnerability, but best practice is to treat all such internal or undocumented identifiers as an implementation detail—do not expose them in sensitive contexts (such as web output, logs with user data, etc.) unless necessary.
- There is no evidence in official documentation that the
Summary Table of Relevant SymbolTable Methods:
Method | Purpose |
---|---|
get_id() |
Returns the symbol table's internal identifier |
get_name() |
Returns the name of the scope (top for module, function/class name otherwise) |
get_type() |
Returns the table's type ('module' , 'class' , or 'function' ) |
get_lineno() |
Returns the first line number of the scope |
If you require a value that is stable and reveals no internal information, prefer using get_name()
or a custom mapping on your analysis layer instead of exposing the value from get_id()
.
References:
[1][2][5][6]: Python's official symtable
module documentation and tutorials.
Citations:
- 1: https://documentation.help/python-3-7-3/symtable.html
- 2: https://omz-software.com/pythonista/docs-3.4/py3/library/symtable.html
- 3: https://github.com/python/cpython/blob/main/Python/symtable.c
- 4: https://web.cs.dal.ca/~dpc/2023-06-22-icpc-open/docs/python/library/symtable.html
- 5: https://python.readthedocs.io/fr/stable/library/symtable.html
- 6: https://eli.thegreenplace.net/2010/09/20/python-internals-symbol-tables-part-2
- 7: https://www.cc.kyoto-su.ac.jp/~atsushi/Programs/VisualWorks/CSV2HTML/CSV2HTML_PyDoc/symtable.html
- 8: https://introcs.cs.princeton.edu/python/44st/
- 9: https://github.com/k2-fsa/k2/blob/master/k2/python/k2/symbol_table.py
- 10: https://tenthousandmeters.com/blog/python-behind-the-scenes-2-how-the-cpython-compiler-works/
Align SymbolTable.id() with CPython’s get_id(): avoid exposing raw pointers
The current implementation in vm/src/stdlib/symtable.rs
returns the struct’s memory address as a usize
, which diverges from CPython’s symtable.SymbolTable.get_id()
(an internal, opaque identifier) and can leak pointer layouts.
Suggested changes:
- Replace the pointer cast in
fn id(&self) -> usize
with a dedicated, incrementing identifier (e.g., anAtomicUsize
counter or per‐table UID stored on creation). - Ensure IDs are unique within a run but do not expose raw addresses.
- Update documentation to note that
id()
returns an internal symbol-table ID, not a memory address.
Locations to update:
- vm/src/stdlib/symtable.rs, lines ~170–173 (the
fn id(&self)
definition)
Example diff sketch:
- #[pygetset]
- fn id(&self) -> usize {
- self as *const Self as *const std::ffi::c_void as usize
- }
+ // Use an internal counter instead of raw pointer to avoid leaking memory addresses
+ #[pygetset]
+ fn id(&self) -> usize {
+ self.inner_id // populated from a global AtomicUsize at creation
+ }
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In vm/src/stdlib/symtable.rs around lines 170 to 173, the id() method currently
returns the raw memory address cast to usize, which exposes internal pointer
details and differs from CPython's opaque get_id(). To fix this, replace the
pointer cast with a unique internal identifier generated via a static
AtomicUsize counter incremented for each new SymbolTable instance or assign a
unique ID at creation stored in the struct. Update the id() method to return
this stored unique ID instead of the pointer. Also, revise the method's
documentation to clarify that id() returns an internal symbol table identifier,
not a memory address.
#[pymethod] | ||
const fn is_nested(&self) -> bool { | ||
self.symtable.is_nested | ||
} | ||
|
||
#[pymethod] | ||
const fn is_optimized(&self) -> bool { | ||
matches!( | ||
self.symtable.typ, | ||
CompilerScope::Function | CompilerScope::AsyncFunction | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not delete these methods but change to getset, just like get_name and get_lineno.
Unlike other more complex cases, the deletion of these 2 methods cause unnecessary regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this. And made type
to return an int instead of a String, so now more tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
vm/src/stdlib/symtable.rs (1)
181-184
: Security concern with id() method exposing raw memory addresses.The
id()
method returns raw memory addresses, which could potentially be used for memory layout attacks or information disclosure in security-sensitive contexts.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_symtable.py
is excluded by!Lib/**
📒 Files selected for processing (1)
vm/src/stdlib/symtable.rs
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs
: Follow the default rustfmt code style (cargo fmt
to format)
Always run clippy to lint code (cargo clippy
) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/stdlib/symtable.rs
🧠 Learnings (3)
📚 Learning: in rustpython's stat module implementation, platform-specific constants like sf_supported and sf_syn...
Learnt from: moreal
PR: RustPython/RustPython#5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
vm/src/stdlib/symtable.rs
📚 Learning: in rustpython's stat module implementation, platform-specific constants like sf_supported and sf_syn...
Learnt from: moreal
PR: RustPython/RustPython#5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
vm/src/stdlib/symtable.rs
📚 Learning: applies to **/*.rs : use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implement...
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
vm/src/stdlib/symtable.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run rust tests (ubuntu-latest)
🔇 Additional comments (5)
vm/src/stdlib/symtable.rs (5)
149-152
: LGTM! Method to property conversion aligns with feedback.Converting
get_name
to aname
property using#[pygetset]
aligns well with the previous review feedback to use getset instead of deleting methods.
171-179
: LGTM! Children property correctly replaces get_children method.The
children
property properly converts sub-tables to Python objects and aligns with the goal of using properties instead of explicit methods.
186-195
: LGTM! Identifiers property conversion is appropriate.Converting
get_identifiers
to anidentifiers
property maintains the same functionality while following the property-based interface pattern.
207-210
: LGTM! Nested property appropriately replaces is_nested method.The
nested
property correctly exposes the symbol table's nested status, following the pattern established by other property conversions.
15-110
: Constants section matches CPython internal definitionsThe bit shifts and composite values (e.g.
DEF_BOUND
,SCOPE_MASK
) exactly mirror those in CPython’spycore_symtable.h
. No changes needed here.
#[pygetset(name = "type")] | ||
fn typ(&self) -> i32 { | ||
match self.symtable.typ { | ||
CompilerScope::Function => TYPE_FUNCTION, | ||
CompilerScope::Class => TYPE_CLASS, | ||
CompilerScope::Module => TYPE_MODULE, | ||
CompilerScope::TypeParams => TYPE_TYPE_PARAMETERS, | ||
_ => -1, // TODO: missing types from the C implementation | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify integer constant mapping for type property.
The typ()
method now returns integer constants instead of strings, which is a significant interface change. The mapping looks reasonable, but the fallback to -1
for unhandled cases may not align with CPython's behavior.
🏁 Script executed:
#!/bin/bash
# Check CPython's SymbolTable.get_type() return values
python3 -c "
import symtable
st = symtable.symtable('def f(): pass', '<string>', 'exec')
print('Module type:', repr(st.get_type()))
for child in st.get_children():
print('Function type:', repr(child.get_type()))
"
Length of output: 297
Ensure type
property matches CPython’s string-based API
CPython’s SymbolTable.get_type() returns strings like "module"
and "function"
, not integer codes. Our Rust typ()
currently returns integer constants and falls back to -1
for any unhandled cases, which:
- Breaks compatibility with existing users expecting string values
- Omits several type names (e.g. comprehensions, lambdas, generators) that CPython exposes
Please update the implementation to either:
- Return Python strings matching CPython’s names, or
- If you intentionally switch to integers, document the change clearly and replace the
-1
fallback with a dedicatedTYPE_UNKNOWN
constant—and add mappings for all types in CPython’s C implementation.
Points to address:
- Decide on integer vs. string return type and update docs accordingly
- Remove the
// TODO
and implement mappings for missing CPython types (generator, lambda, comprehension, etc.) - Eliminate the
-1
fallback or replace it with a well-defined default constant
🤖 Prompt for AI Agents
In vm/src/stdlib/symtable.rs lines 154 to 163, the typ() function currently
returns integer constants and uses -1 as a fallback, which breaks compatibility
with CPython's string-based SymbolTable.get_type() API and omits several type
names. To fix this, change the return type to Python strings matching CPython's
type names (e.g., "module", "function", "class", "generator", "lambda",
"comprehension"), remove the -1 fallback, and implement mappings for all missing
CPython types. Also update any related documentation to reflect the string
return type and remove the TODO comment.
#[pygetset] | ||
fn symbols(&self, vm: &VirtualMachine) -> PyResult<PyDictRef> { | ||
let dict = vm.ctx.new_dict(); | ||
for (name, symbol) in &self.symtable.symbols { | ||
dict.set_item(name, vm.new_pyobj(symbol.flags.bits()), vm) | ||
.unwrap(); | ||
} | ||
Ok(dict) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Significant interface change: symbols now returns dictionary instead of list.
The symbols
property now returns a dictionary mapping symbol names to their flag bits, which is a major change from the previous list-based return. This appears to align better with how symbols are typically accessed.
🏁 Script executed:
#!/bin/bash
# Verify CPython's SymbolTable.get_symbols() return type and structure
python3 -c "
import symtable
st = symtable.symtable('x = 1', '<string>', 'exec')
symbols = st.get_symbols()
print('Type:', type(symbols))
if symbols:
print('First symbol type:', type(symbols[0]))
print('First symbol methods:', [m for m in dir(symbols[0]) if not m.startswith('_')])
"
Length of output: 639
Breaking change: symbols
now returns dict instead of list
You’ve changed the symbols
getter in vm/src/stdlib/symtable.rs (lines 197–205) from a list of Symbol
objects to a dict[str, int]
. In CPython, symtable.symtable(...).get_symbols()
returns a list[Symbol]
(each with methods like is_assigned()
, get_name()
, etc.), not a mapping of names to flag bits.
This will break any existing code that:
- Iterates over symbols as
Symbol
instances - Calls methods on
Symbol
objects - Relies on the return being a list
Please address this by one of the following:
- Revert
symbols
to returnPyListRef
ofSymbol
objects (e.g.vm.new_pyobj(symbol)
) for compatibility - Introduce a new method/property (e.g.
symbol_flags
) for the dict view, and preservesymbols
as the list ofSymbol
- If this breaking change is intentional, update the documentation/changelog to call it out clearly
🤖 Prompt for AI Agents
In vm/src/stdlib/symtable.rs around lines 197 to 205, the symbols getter was
changed to return a dict mapping symbol names to flag bits, but in CPython it
returns a list of Symbol objects. To fix this, revert the symbols method to
return a PyListRef of Symbol objects by creating a list and adding each symbol
object (e.g. using vm.new_pyobj(symbol)) to it, preserving compatibility. If the
dict view is needed, add a new method or property like symbol_flags for that
purpose instead of changing symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Summary by CodeRabbit
New Features
Refactor
PySymbolTable
interface by replacing several explicit methods with Python-style properties.Chores