-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Limit Forward references in Mathtext parser #26198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Comment indicates that limiting forward references is important for performance, but many more were used than needed. This is a 75% reduction in forward reference instances. Many of these are included in the definition of remaining forward references but do not themselves need to be a forward reference (indeed only 1 thing in a given cycle actually needs to be such that it can be referred to prior to its definition) `auto_delim` is recursively self-referential, and thus _must_ be all on its own `placeable` and `accent` form a tight loop, so one of those two needed to be a forward reference, and placeable was the easiest to keep (and most likely to be used in another definition that may introduce similar constraints) `token` is one of the more generic definitions, that is used in multiple other definitions. `required_group` and `optional_group` both caused failed tests when I tried to make them no longer forward references, has something to do with how `__call__` works for `Forward` vs `And` objects in pyparsing (though did not dig too much deeper into it, beyond noticing that the names in future `required_group("name")` calls still had error messages that say `"group"`, not their newer names, and that reverting to `Forward` instances fixed it). `operatorname` needed to move to avoid using forward references for `simple`, but does not actually have a cycle. Otherwise rewrapped some of the definitions to be one line and fixed a typing error that I noticed when looking into the parsing things.
# Set names on (almost) everything -- very useful for debugging | ||
# token, placeable, and auto_delim are forward references which | ||
# are left without names to ensure useful error messages | ||
if key not in ("token", "placeable", "auto_delim"): | ||
val.setName(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change I didn't fully intend on keeping in this PR... this was part of the discussion from #26152 that is likely to be part of the fix for making the error messages work for future versions of pyparsing...
I'm not totally against leaving this in as I think it is likely to be wanted soon, but will allow reviewers to revert it if it is too soon.
# Set names on (almost) everything -- very useful for debugging | |
# token, placeable, and auto_delim are forward references which | |
# are left without names to ensure useful error messages | |
if key not in ("token", "placeable", "auto_delim"): | |
val.setName(key) | |
# Set names on everything -- very useful for debugging | |
val.setName(key) |
As for performance claims, it (in not that high of N) seems to be about a 5% speed up, judging from time the time it takes to run Running on main:
on pr branch:
So that is a 22% improvement focusing on parsing, which is honestly more than I thought it would be (granted, parsing mathtext is rarely the bottleneck, but still). |
Closing and reopening to pick up #26199 since ghactions pytest run didn't go. |
I think the changes are really helpful! |
# Set names on (almost) everything -- very useful for debugging | ||
# token, placeable, and auto_delim are forward references which | ||
# are left without names to ensure useful error messages | ||
if key not in ("token", "placeable", "auto_delim"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is kept (seems OK to me although I cannot really see the potential impact), I'd suggest defining the tuple as a set outside of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impact should be pretty minimal for earlier versions of pyparsing. Essentially if one of these raises an error itself it gets more verbose... but I'm having trouble even making these error (themselves and not a deeper constituent piece) when I try to, so not totally convinced those error message ever happen.
On pyparsing 3.1.0, these error messages override the constituent parts, and thus are more visible, so it goes from
Expected token, got ...
To:
Expected Forward {{ simple | ... } | unknown_symbol}, got ...
Which could be better or worse depending on context, but the idea is to facilitate better errors in as yet unreleased pyparsing:
And on future versions of pyparsing, assuming we complete the line drawing we have been discussing, this will signal to return to pre 3.1.0 behavior (at least with many of the proposals).
Personally, I think moving the tuple negatively impacts readablity for rather minimal performance gains (this function is only called at parser init time, though admittedly multiple times, and the for loop has relatively few entries (~13 in the first call, ~40 in the second call, prior to this PR it is called 3 times, but I found no need for it to be called both before and after adding the Forward
definitions, which don't rely on it having been called prior)) So moving would be saving ~50 tuple definitions, total per python process (even the parser object gets cached as a class variable and reused via the public API, so even creating a new MathTextParser
does not rerun this). And even then, not convinced the variable lookup would be any faster
I think we are talking ~1 microsecond, total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always have a hard time judging minor impact on readability and minor impact on performance and will always go for performance...
May require backporting a portion of matplotlib#26198 to the 3.7 branch if we wish to have a 3.7.3. (specifically the part where token, placeable, and auto_delim are excluded from )
May require backporting a portion of matplotlib#26198 to the 3.7 branch if we wish to have a 3.7.3. (specifically the part where token, placeable, and auto_delim are excluded from setName)
Cherry picked and fixed up, ignored changes to the pyi file (which doesn't exist on this branch).
Cherry picked and fixed up, ignored changes to the pyi file (which doesn't exist on this branch).
PR summary
Comment indicates that limiting forward references is important for
performance, but many more were used than needed.
This is a 75% reduction in forward reference instances.
Many of these are included in the definition of remaining forward
references but do not themselves need to be a forward reference (indeed
only 1 thing in a given cycle actually needs to be such that it can be
referred to prior to its definition)
auto_delim
is recursively self-referential, and thus must be all onits own
placeable
andaccent
form a tight loop, so one of those two neededto be a forward reference, and placeable was the easiest to keep (and
most likely to be used in another definition that may introduce similar
constraints)
token
is one of the more generic definitions, that is used in multipleother definitions.
required_group
andoptional_group
both caused failed tests when Itried to make them no longer forward references, has something to do
with how
__call__
works forForward
vsAnd
objects in pyparsing(though did not dig too much deeper into it, beyond noticing that the
names in future
required_group("name")
calls still had error messagesthat say
"group"
, not their newer names, and that reverting toForward
instances fixed it).operatorname
needed to move to avoid using forward references forsimple
, but does not actually have a cycle.Otherwise rewrapped some of the definitions to be one line and fixed a
typing error that I noticed when looking into the parsing things.
PR checklist
@devRD I would appreciate your review on this. If this reorg would cause conflicts for other PRs already in flight, it can absolutely wait.
I just saw the comment saying to minimize usage of Forward and then was immediately like "why is this a Forward" and kept going.
The changes themselves are relatively minimal and would be easy to redo if needed:
<<
from<<=
in the redefinition