Skip to content

Refactor codecs #5623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 26, 2025
Merged

Refactor codecs #5623

merged 2 commits into from
Mar 26, 2025

Conversation

coolreader18
Copy link
Member

This rejiggles some stuff around so that the situation is now:

  • {En,De}codeContext owns the input data and defines the error type.
  • {En,De}codeErrorhandler is passed the context and just encapsulates the error handling behavior.
  • standard error handlers are now defined in the rustpython_common::encodings::errors module.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new way to handling errors

}

#[derive(Copy, Clone, Default, Debug)]
pub struct StrSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have both StrSize and StrLen.
Rust always uses len for string and bytes length.
This struct is named StrSize but containing a pair of length.
Any chance this can be named better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it similar to ruff's TextSize or even just usize. I think I want to do a follow-up to consolidate the 2 structs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can contain a pair of lengths, but you could also treat them as indices, in the same way usize can be a length or an index.

@@ -872,8 +875,8 @@ impl Wtf8 {
}
}

pub fn clone_into(&self, buf: &mut Wtf8Buf) {
self.bytes.clone_into(&mut buf.bytes);
pub fn is_code_point_boundary(&self, index: usize) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn is_code_point_boundary(&self, index: usize) -> bool {
pub fn is_code_point_boundary_at(&self, index: usize) -> bool {

not the whole string is boundary, but the byte of index is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method in std is named str::is_char_boundary

Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>
@coolreader18 coolreader18 merged commit 2ab8716 into RustPython:main Mar 26, 2025
11 checks passed
@coolreader18 coolreader18 deleted the refactor-codecs branch March 26, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants