Refactor codecs #5623

coolreader18 · 2025-03-26T03:11:23Z

This rejiggles some stuff around so that the situation is now:

{En,De}codeContext owns the input data and defines the error type.
{En,De}codeErrorhandler is passed the context and just encapsulates the error handling behavior.
standard error handlers are now defined in the rustpython_common::encodings::errors module.

youknowone

I like the new way to handling errors

youknowone · 2025-03-26T14:22:44Z

common/src/encodings.rs

+}
+
+#[derive(Copy, Clone, Default, Debug)]
+pub struct StrSize {


We now have both StrSize and StrLen.
Rust always uses len for string and bytes length.
This struct is named StrSize but containing a pair of length.
Any chance this can be named better?

I considered it similar to ruff's TextSize or even just usize. I think I want to do a follow-up to consolidate the 2 structs.

It can contain a pair of lengths, but you could also treat them as indices, in the same way usize can be a length or an index.

common/src/encodings.rs

youknowone · 2025-03-26T14:49:53Z

common/src/wtf8/mod.rs

@@ -872,8 +875,8 @@ impl Wtf8 {
        }
    }

-    pub fn clone_into(&self, buf: &mut Wtf8Buf) {
-        self.bytes.clone_into(&mut buf.bytes);
+    pub fn is_code_point_boundary(&self, index: usize) -> bool {


Suggested change

pub fn is_code_point_boundary(&self, index: usize) -> bool {

pub fn is_code_point_boundary_at(&self, index: usize) -> bool {

not the whole string is boundary, but the byte of index is

The method in std is named str::is_char_boundary

Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>

Refactor codecs

f323d14

coolreader18 force-pushed the refactor-codecs branch from 7554510 to f323d14 Compare March 26, 2025 07:24

youknowone approved these changes Mar 26, 2025

View reviewed changes

Remove commented-out code

e3d96aa

Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>

coolreader18 merged commit 2ab8716 into RustPython:main Mar 26, 2025
11 checks passed

coolreader18 deleted the refactor-codecs branch March 26, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor codecs #5623

Refactor codecs #5623

Uh oh!

coolreader18 commented Mar 26, 2025

Uh oh!

youknowone left a comment

Uh oh!

youknowone Mar 26, 2025

Uh oh!

coolreader18 Mar 26, 2025

Uh oh!

coolreader18 Mar 26, 2025

Uh oh!

Uh oh!

youknowone Mar 26, 2025

Uh oh!

coolreader18 Mar 26, 2025

Uh oh!

Uh oh!

Uh oh!

	pub fn is_code_point_boundary(&self, index: usize) -> bool {
	pub fn is_code_point_boundary_at(&self, index: usize) -> bool {

Refactor codecs #5623

Refactor codecs #5623

Uh oh!

Conversation

coolreader18 commented Mar 26, 2025

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

youknowone Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

coolreader18 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

coolreader18 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

youknowone Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

coolreader18 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!