Skip to content

Update rustpython_literal::escape to support wtf8 #5630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 28, 2025

Conversation

coolreader18
Copy link
Member

The first commit splits out rustpython_common::wtf8 into its own crate, so that things like literal and sre_engine don't have to depend on all of common just to implement something for wtf8. I also simplified the crate graph a little by removing compiler's dependency on common, since expandtab was the only function left that compiler-codegen needed, and that's simple enough to just copy over. Originally the point of common was for there to be shared code between vm and compiler, but I guess the times change :)

Workspace crate graph before:

image

Workspace crate graph after:

image

@youknowone
Copy link
Member

How about moving wtf8 to project root? When it is placed under common, it looks like only common can directly use wtf8 but nothing other

@coolreader18
Copy link
Member Author

Hmm, fair.

@coolreader18
Copy link
Member Author

At some point we should probably restructure to everything being under a crates/ directory.

/// # Safety
///
/// This string must only contain printable characters.
unsafe fn write_source(&self, formatter: &mut impl std::fmt::Write) -> std::fmt::Result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this change.
I expect UnicodeEscape::new_repr(self.as_wtf8()) always safely print the repr. Then the surrogate must be checked somewhere before printing. Doesn't it mean we always print &str, but not wtf8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always print str, yes. I did this because we were already treating write_source as an unsafe function, per the preëxisting comment in AsciiEscape::write_source:

            // SAFETY: this function must be called only when source is printable ascii characters

That was technically unsound, since anybody could just call AsciiEscape::new_repr(b"\xff").write_source() and invoke library UB.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone youknowone merged commit 7d05f88 into RustPython:main Mar 28, 2025
11 checks passed
@coolreader18 coolreader18 deleted the wtf8-repr branch March 28, 2025 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants