-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[DISCUSSION] Dataclasses #2698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
i actually encountered similar issue in the branch i am currently working on: because media object share many common attributes, but without any real hierarchy, i created mixin classes for attributes, and incorporated them into the media classes. but then of course i cannot leave |
Hmm, I can see someone writing custom logic to add attributes to Updates. Ofc they could/maybe they should write it for context, but if they have an old project, making the objects immutable will break their flow. We could maybe make a channel post about that? And btw, if someone uses our library just for the objects and does something with them, would we break their code as well? Differently speaking, can you change the objects easily by e.g. inheriting from them, or before you innit those? If not, I would very much vote for asking the community if making TelegramObjects breaks their flow. |
True, it would be a major breaking change. IMO we should at least discuss it for v14, because if we decide we would like this (as maintainers, or after asking the community - nice idea btw!), and we don't put it in v14, it will be a long while until it happens :) |
sure. maybe put it after/before async if that makes a change for you? |
Don't think it will make a big difference … |
Another thought came up in the user group: When TG adds new attributes to a class, those are currently not accessible in PTB until we make a new release. one could handle that similarly to the PS: One would need to think about whether or not the PPS: Updated MWE with Click to expandimport inspect
import timeit
from pydantic import Extra, BaseModel
class InspectDeJson:
_init_params = None
def __init__(self, arg1, arg2, arg3, arg4, arg5, arg6, arg7, api_kwargs=None):
self.arg1 = arg1
self.arg2 = arg2
self.arg3 = arg3
self.arg4 = arg4
self.arg5 = arg5
self.arg6 = arg6
self.arg7 = arg7
self.api_kwargs = api_kwargs
@classmethod
def de_json(cls, data):
# try-except is significantly faster in case we already have a correct argument set
try:
return cls(**data)
except TypeError as exc:
if str(exc).startswith('__init__() got an unexpected keyword argument'):
if cls._init_params is None:
signature = inspect.signature(cls)
cls._init_params = signature.parameters.keys()
allowed, api_kwargs = {}, {}
for param, value in data.items():
if param in cls._init_params:
allowed[param] = value
else:
api_kwargs[param] = value
return cls(**allowed, api_kwargs=api_kwargs)
else:
raise exc
class PydanticDeJson(BaseModel):
arg1: str
arg2: str
arg3: str
arg4: str
arg5: str
arg6: str
arg7: str
class Config:
extra = Extra.allow
@classmethod
def de_json(cls, data):
return cls(**data)
class KwargsDeJson:
def __init__(self, arg1, arg2, arg3, arg4, arg5, arg6, arg7, **kwargs):
self.arg1 = arg1
self.arg2 = arg2
self.arg3 = arg3
self.arg4 = arg4
self.arg5 = arg5
self.arg6 = arg6
self.arg7 = arg7
@classmethod
def de_json(cls, data):
return cls(**data)
correct_data = {
'arg1': 'arg1',
'arg2': 'arg2',
'arg3': 'arg3',
'arg4': 'arg4',
'arg5': 'arg5',
'arg6': 'arg6',
'arg7': 'arg7',
}
overfull_data = correct_data.copy()
overfull_data.update({str(i): i for i in range(5)})
number = int(5*10e3)
print('Passing correct data')
a = timeit.timeit('InspectDeJson.de_json(correct_data)', globals=globals(), number=number)
b = timeit.timeit('KwargsDeJson.de_json(correct_data)', globals=globals(), number=number)
c = timeit.timeit('PydanticDeJson.de_json(correct_data)', globals=globals(), number=number)
print(f'InspectDeJson took {a/number} seconds on average')
print(f'KwargsDeJson took {b/number} seconds on average')
print(f'PydanticDeJson took {c/number} seconds on average')
print(f'InspectDeJson took {a*100/b} % of KwargsDeJsons runtime')
print(f'PydanticDeJson took {c*100/b} % of KwargsDeJsons runtime')
print('\n' + 20*"=" + '\n')
print('Passing too much data')
a = timeit.timeit('InspectDeJson.de_json(overfull_data)', globals=globals(), number=number)
b = timeit.timeit('KwargsDeJson.de_json(overfull_data)', globals=globals(), number=number)
c = timeit.timeit('PydanticDeJson.de_json(overfull_data)', globals=globals(), number=number)
print(f'InspectDeJson took {a/number} seconds on average')
print(f'KwargsDeJson took {b/number} seconds on average')
print(f'PydanticDeJson took {c/number} seconds on average')
print(f'InspectDeJson took {a*100/b} % of KwargsDeJsons runtime')
print(f'PydanticDeJson took {c*100/b} % of KwargsDeJsons runtime') Output on my machine: Passing correct data
InspectDeJson took 1.635704e-06 seconds on average
KwargsDeJson took 1.9409600000000003e-06 seconds on average
PydanticDeJson took 3.8132918e-05 seconds on average
InspectDeJson took 84.27293710328908 % of KwargsDeJsons runtime
PydanticDeJson took 1964.6421358503005 % of KwargsDeJsons runtime
====================
Passing too much data
InspectDeJson took 9.215419999999996e-06 seconds on average
KwargsDeJson took 3.0822979999999946e-06 seconds on average
PydanticDeJson took 3.994753999999999e-05 seconds on average
InspectDeJson took 298.97887874566356 % of KwargsDeJsons runtime
PydanticDeJson took 1296.0310781112034 % of KwargsDeJsons runtime |
@Bibo-Joshi That's a nice idea but we should also mind the pitfall: |
Some thoughts that we had on this in the dev chat:
|
Some more thoughts about immutability:
|
Also just wanted to say since py 3.11, there is zero cost exception handling, so another plus point for the inspect approach. And note that there is apparently a small performance penalty when using frozen instances. Can we benchmark this and see how much it really is for our case? |
Yesterday I tried to get started on a proof of concept. And I almost immediately ran into two major problems, which frankly I'm embarrased to not have seen so far. Both are related to optional arguments. SlotsNot only do dataclasses bring built-in support for slots only on python 3.10+, but it's actually not possible to directly use them manually on 3.9- with default arguments: @dataclass
class Test:
__slots__ = ("arg",)
arg: str = "arg" will raise an exception. Actually this is not due to dataclasses directly (removing InheritanceThe second problem appears when we want to inherit from a dataclass with optional arguments: @dataclass
class Parent:
arg: str = "arg"
@dataclass
class Child(Parent):
barg: str now So far these are my observations and I'm rather bummed about that. I haven't gotten far in terms of thinking about our options, yet … |
A few more insights: pydantic
attrs
Big pictureThis issue came to be because at least #2491 and #2607 looked like they could be resolved easily by using dataclasses and because they looked promising in terms of saving us some boilerplate code. But in the end, they apparently don't fit our requirements and using them just for the sake of it doesn't make sense. 3rd party libs also can't really help (see above) and we're no fans of extensive 3rd party use anyway. So let's evaluate what our options are to implement the things that we were considering in the context of dataclasses, without dataclasses.
|
Oh, I forgot:
|
The idea to use dataclasses has been discussed a number of times (see #2056), more recently in #2491 and there is also a connection to #2607. So here I am opening an issue to write down some thoughts.
This is rather for from being my final judgement on the matter and I'm aware that it will be a while until we properly discuss this. So more of a reminder for future-me.
Motivation
We have a lot of boilerplate code. especially the inits of classes like
Message
take up a lot of place, along with__slots__
andde/to_json
.There are ways to hide those away by using 3rd pary libraries (that inspired
dataclasses
) or the built-indataclasses
. Here is what such approaches can possibly do for us:Edits:
ext.Defaults
andCallbackContext
, especially making them immutable__new__
workaround inTelegramObject
: The reason for that was that we didn't want to have to callsuper().__init__
in all the subclasses. ifTO
is a dataclass as well, then the auto-generated inits of the subclasses would automatically do that for usA word on 3rd party libs
A number of 3rd party libraries that offer similar functionality as
dataclasses
, most notablyattrs
andpydantic
but also some others. I know that we try to keep our dependencies at a minimum and I'm fully backing that mentality. Still, let me list some pros & cons that I see of possible usage ofattrs
&pydantic
- compared to usingdataclasses
pydantic
has this built-in,attrs
can do this together with thecattrs
librarypydantic
is heavily inspired by that. This means that e.g.self.text = text
is replaced byself.text = str(text)
inpydantic
, but apparently also slows it downattrs
comes with support for slots, which dataclass only adds in py3.10. IISCpydantic
does notautomaticallypydantic
norattrs
can do both extra arguments and slotspydantice
provides plugins formypy
and PyCharm)InputMedia*
) #2607 and I'll go into detail on that later. TL;DR:pydantic
can dismiss additional kwargs that are passed to init, e.g.Chat(…, new_argument=42)
will work.attrs
can't, IISC.dataclass
can't eitherBut let's focus on the last bullet point. If we can't handle that properly, then there is no use for
attrs
ordataclass
at allDismissal of additional kwargs
Currently there is no way to make
dataclass
add**kwargs
to the auto-generated__init__
.But what if we approach this problem differently? Instead of passing the additional kwargs, we can filter them out before. Ofc this is more expensive, but
**kwargs
we no longer just swallow typos in arguments etc. Admittedly, this benefit is mall, since classes likeMessage
,Chat
etc are usually not manually instantiated.Such an approach should work with both
attrs
anddataclass
One thing to keep in mind is that we currently pass
bot
to all classes, so a lot of them always have one additional kwarg. This should be easily solvable by different means. E.g. one couldbot
to all signaturesHAS_SHORTCUTS
that tells us whether or not we need to passbot
TelegramObject.set_bot
to removebot
from all signaturesSo here's a small comparison of the two approaches:
Example output on my machine:
So in the "usual" case the overhead is neglectible IMO and the usage of inspect-dark magic is not too extensive.
In the cases where we have unhandled data, the slow down is heavily notiable - but we're still talking 4 vs 2 micro seconds and as explained above, this shouldn't happen too often.
Summary
For today, much punchlines are:
to/de_json
, but neitherpydantic
norattrs
has all the features that I would like. Neither doesdataclass
, but at least that's included batteriesPS:
PyCharm apparently still has no proper support for
dataclasses
- not that this would stop us, just mentioning it …The text was updated successfully, but these errors were encountered: