-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-134873: fix various quadratic worst-time complexities in _header_value_parser.py
[WIP]
#134947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
_header_value_parser.py
[WIP]_header_value_parser.py
[WIP]
Urgh, so there is still a quadratic complexity that I need to think about, it's when we alternate if-branches. For instance in try:
token, value = get_word(value)
phrase.append(token)
except errors.HeaderParseError:
phrase.defects.append(errors.InvalidHeaderDefect(
"phrase does not start with word"))
while value and value[0] not in PHRASE_ENDS:
if value[0]=='.':
phrase.append(DOT)
phrase.defects.append(errors.ObsoleteHeaderDefect(
"period in 'phrase'"))
value = value[1:]
else:
try:
token, value = get_word(value)
except errors.HeaderParseError:
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
phrase.defects.append(errors.ObsoleteHeaderDefect(
"comment found without atom"))
else:
raise
phrase.append(token)
return phrase, value The if value[0]=='.':
phrase.append(DOT)
phrase.defects.append(errors.ObsoleteHeaderDefect(
"period in 'phrase'"))
value = value[1:] is quadratic if we're processing multiple times
Maybe switch to a stateful parser? That way, we shouldn't have high complexity and we should be fine. But this requires a complete rewrite of this module. |
Ok, this patch still fixes some cases but not all. Cases where two branches alternate would still be subject to O(n²) complexities (unless we avoid the copy in The advantage of |
@serhiy-storchaka I'm a bit stuck here. I don't really have a better idea than to rewrite the module where we would use a So, to ensure backward compatibility, I think I'll need a new module... I can't think of another solution instead of entirely rewriting the logic so that we don't have un-necessary slice so your help would be appreciated, TiA! I thought about holding the current "index" where the parser stopped but again, I don't think it'll be sufficient as I'll still need to make slices at some point to extract some values to hold (OTOH, using a deque allows me to move some characters to elsewhere without having to copy the string twice, though I'll still need a ''.join() on the part that is being stored). def get_something(value):
storage = Storage()
head = []
while cond(value):
head.append(value.popleft())
storage.value = ''.join(head)
return storage, value |
This is a work in progress (I need to go now) but I'll continue tomorrow. I want to add tests, and some other places are still not fixed because I didn't find a straightforward fix.