Skip to content

Fix subclasses of string. (Fixes #130) #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions tests/snippets/subclass_str.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from testutils import assertRaises

x = "An interesting piece of text"
assert x is str(x)

class Stringy(str):
def __new__(cls, value=""):
return str.__new__(cls, value)

def __init__(self, value):
self.x = "substr"

y = Stringy(1)
assert type(y) is Stringy, "Type of Stringy should be stringy"
assert type(str(y)) is str, "Str of a str-subtype should be a str."

assert y + " other" == "1 other"
assert y.x == "substr"

## Base strings currently get an attribute dict, but shouldn't.
# with assertRaises(AttributeError):
# "hello".x = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more consistent that base strings have attribute dicts, I believe cpython is wrong here. We can deviate. You can use the platform module to check for rustpython.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like a pretty serious performance question, if we have to allocate dicts for intermediate ints and bools in computations.

Copy link
Collaborator Author

@cthulahoops cthulahoops Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important enough that __slots__ exists so that user classes can take advantage of this.

Copy link
Contributor

@adrian17 adrian17 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like a pretty serious performance question, if we have to allocate dicts for intermediate ints and bools in computations.

Not just that; this also makes other common optimizations on builtin types that we would probably implement at some point (string interning, small int cache) much weirder and harder to reason about.

In CPython:

>>> a = "hello"
>>> b = "hello"
>>> a is b
True
>>> a = 123
>>> b = 123
>>> a is b
True

So if in one section of code I do "hello".x = 5, this may affect any other code (including third-party library) using the same "hello" literal - as it may or may not be interned. Same with other builtins. That's implicit global state.

29 changes: 17 additions & 12 deletions vm/src/obj/objstr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ use unicode_segmentation::UnicodeSegmentation;

use crate::format::{FormatParseError, FormatPart, FormatString};
use crate::pyobject::{
IntoPyObject, OptionalArg, PyContext, PyFuncArgs, PyIterable, PyObjectRef, PyRef, PyResult,
PyValue, TypeProtocol,
IdProtocol, IntoPyObject, OptionalArg, PyContext, PyFuncArgs, PyIterable, PyObjectRef, PyRef,
PyResult, PyValue, TryFromObject, TypeProtocol,
};
use crate::vm::VirtualMachine;

use super::objint;
use super::objsequence::PySliceableSequence;
use super::objslice::PySlice;
use super::objtype;
use super::objtype::{self, PyClassRef};

#[derive(Clone, Debug)]
pub struct PyString {
Expand Down Expand Up @@ -788,16 +788,21 @@ fn perform_format(
// TODO: should with following format
// class str(object='')
// class str(object=b'', encoding='utf-8', errors='strict')
fn str_new(vm: &mut VirtualMachine, args: PyFuncArgs) -> PyResult {
if args.args.len() == 1 {
return Ok(vm.new_str("".to_string()));
}

if args.args.len() > 2 {
panic!("str expects exactly one parameter");
fn str_new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that methods exactly match their python names, should we do the same for free-standing functions? i.e. should this just be str?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I quite like the convention, it makes it a bit easier to know what's what. I think I'd call that new rather than str though.

Of course, that function could be a method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can't be a method called new because that clashes with the implementation of new in PyRef.

Copy link
Contributor

@OddCoincidence OddCoincidence Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of it being a new method though (feels more idiomatic). We can probably get rid of PyRef::new and have a generic into_ref on PyValues.

(doesn't need to be in this PR though, just tossing out ideas)

cls: PyClassRef,
object: OptionalArg<PyObjectRef>,
vm: &mut VirtualMachine,
) -> PyResult<PyStringRef> {
let string = match object {
OptionalArg::Present(ref input) => vm.to_str(input)?,
OptionalArg::Missing => vm.new_str("".to_string()),
};

vm.to_str(&args.args[1])
if string.typ().is(&cls) {
TryFromObject::try_from_object(vm, string)
} else {
let payload = string.payload::<PyString>().unwrap();
PyRef::new_with_type(vm, payload.clone(), cls)
}
}

impl PySliceableSequence for String {
Expand Down