Skip to content

Delayed import of pyparsing for fontconfig pattern matching #12915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

timhoffm
Copy link
Member

@timhoffm timhoffm commented Dec 1, 2018

PR Summary

This PR aims at speeding up the matplotlib import by lazily importing pyparsing. See #11546.

Essentially this has two parts:

  • Move the pyparsing import into FontconfigPatternParser.__init__ so that it's only imported when needed.
  • The parser is needed to parse the rc settings. Thus it would always be instantiated and the pyparsing import would still be needed at matplotlib import time. However, at least the default rc settings do only contain very simple patterns. The PR uses a static cache of these simple patterns to prevent instantiating the full FontConfigParser.

This yields at least a 10% faster matplotlib import for standard settings. See the import timing in #11546.

I also did the timing myself, which I don't fully understand since the import without the modification of this PR has some additional import delay directly in the matplotlib base package. Anyway, the rcsetup und thus pyparsing have vanished as expected.

before:
image

after:
image

Note:
The _SIMPLE_PATTERN_CACHE may be expanded if you know other simple and common fontconfig patterns.

@QuLogic
Copy link
Member

QuLogic commented Dec 2, 2018

I also did the timing myself, which I don't fully understand since the import without the modification of this PR has some additional import delay directly in the matplotlib base package.

If you've recently modified some files (editing or switching branches, etc.), then be careful to run the test at least twice. The first import can be slowed down by updating of the bytecode files.

@timhoffm
Copy link
Member Author

timhoffm commented Dec 2, 2018

@QuLogic Thanks for the hint. I ran each code three times and just posted the result of the third run.

Anyway, the result shows that pyparsing is not imported and thus importing rcsetup is fast. And that's the goal of the PR.

@jklymak
Copy link
Member

jklymak commented Dec 2, 2018

This looks good. If pyparsing is such a heavy lift is there something lighter than can be imported?

@anntzer
Copy link
Contributor

anntzer commented Dec 2, 2018

I'd like to ultimately get rid of fontconfig patterns altogether (#10249), but incremental improvements are fine in the meantime.
I don't think this really helps with any realistic import time though, as you'll also import mathtext (through e.g. backend_agg, or any other backend) which also imports pyparsing? IOW you need to also delay the import in mathtext to get useful import-time benefits I think?

@tacaswell tacaswell added this to the v3.1 milestone Dec 2, 2018
@tacaswell
Copy link
Member

👍 in principle, agree that incremental improvements are worth doing, but also agree with @anntzer that this may be un-done by the mathtext import. It maybe better to time importing pyplot (with Agg as the backend)?

@anntzer
Copy link
Contributor

anntzer commented Dec 2, 2018

or something like matplotlib.figure + matplotlib.backend_bases, which should be the absolute minimum needed to do anything (e.g. using a third-party (hemhem :)) backend).

@timhoffm
Copy link
Member Author

timhoffm commented Dec 3, 2018

Ok, here's the import timing of

python3.7 -X importtime -c "import matplotlib.pyplot as plt; plt.use('agg')" 2> matplotlib.log

image

As one can see, there's still no pyparsing import. Of course, this PR is "just" a 5% improvement, but it works as is.

Really wondering what the big gap in matplotlib is doing.

@tacaswell
Copy link
Member

My guess at the big delay is versioneer calling git via subprocess. That only happens in the dev install, if you 'properly' install it that call is replaced by a string lookup.

@anntzer
Copy link
Contributor

anntzer commented Dec 17, 2018

I don't know why it doesn't show up in -Ximporttime output, but pyparsing is actually still being imported: this can be checked by printing out sys.modules; or one can just also check that backend_agg imports mathtext, which imports pyparsing...

@timhoffm
Copy link
Member Author

Does this mean, this PR will have no effect despite the importtime results?

@anntzer
Copy link
Contributor

anntzer commented Dec 17, 2018

What's exactly happening is a bit mysterious to me too, but I'm just reporting my observations...

@timhoffm
Copy link
Member Author

No idea how to investigate this further.

So, either we merge this and hope that it has some effect, or we simply close the PR. Both would be fine for me.

@anntzer
Copy link
Contributor

anntzer commented Dec 17, 2018

I would prefer we figure out what's happening before taking further action either way...

@timhoffm
Copy link
Member Author

Do you have any idea what to check?

@anntzer
Copy link
Contributor

anntzer commented Dec 17, 2018

nothing right now

@anntzer
Copy link
Contributor

anntzer commented Jan 16, 2019

Added the needs revision tag based on the discussion above as I'm not actually convinced this works, but again I'm not really sure what's exactly happening here.

@jklymak jklymak modified the milestones: v3.1.0, v3.2.0 Feb 9, 2019
@timhoffm timhoffm modified the milestones: v3.2.0, v3.3.0 Aug 7, 2019
@QuLogic
Copy link
Member

QuLogic commented Mar 25, 2020

I just rebased this on master and tried it out, but it does not seem to have any effect on overall import time. The rcsetup -> fontconfig_pattern part of the tree is indeed faster, but the overall matplotlib section remains the same.

Edit: I forget to do a non-editable install like @tacaswell mentioned, but while a bit quicker, it's still not really affected by this PR.

@timhoffm
Copy link
Member Author

Rebased so that it's easier to look into this further.

@timhoffm
Copy link
Member Author

timhoffm commented Mar 25, 2020

AFAICS there are the following additional ways we import pyparsing:

  • in _check_versions(). IMHO this could be delayed (or dropped? - we ensure at least 2.0.1, which was released in 2013).
  • via the following import sequence
    import matplotlib.pyplot as plt
    -> import matplotlib.colorbar
    -> import matplotlib.contour as contour
    -> import matplotlib.text as text
    -> from .textpath import TextPath
    -> from matplotlib.mathtext import MathTextParser
    -> from pyparsing import (
    

Technical note: I've patched sys.modules at the top of __init__.py to get notifed when a pyparsing import is requested. - Not sure if it's the best way to do it, but it got me the above info 😄

from collections.abc import MutableMapping
import sys
import traceback


class LogDict(MutableMapping):

    def __init__(self, d):
        self.d = d

    def __getitem__(self, name):
        if 'pyparsing' in name:
            print(f'\n\nget {name}')
            traceback.print_stack()
        return self.d[name]

    def __setitem__(self, name, val):
        self.d[name] = val

    def __delitem__(self, name):
        del self.d[name]

    def __iter__(self):
        return iter(self.d)

    def __len__(self):
        return len(self.d)


sys.modules = LogDict(sys.modules)

@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 1, 2020
@jklymak jklymak marked this pull request as draft August 24, 2020 15:10
@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 21, 2021
@timhoffm timhoffm modified the milestones: v3.5.0, unassigned Jul 8, 2021
@anntzer
Copy link
Contributor

anntzer commented Oct 16, 2021

AFAICT, recent versions of pyparsing now import much faster, and now only contribute <2% of the total import time. Unless others can repro the original problem, I propose to close this.

@timhoffm
Copy link
Member Author

timhoffm commented Dec 5, 2021

I still get a significant contribution (27%) from pyparsing 3.0.6. But my dev environment is a bit older overall. Maybe I should test with a fresh setup.

grafik

OTOH, I likely won't come back to this any time soon.

@anntzer
Copy link
Contributor

anntzer commented Dec 5, 2021

FTR this is what I get as of master:
tuna

@timhoffm
Copy link
Member Author

timhoffm commented Dec 5, 2021

Interestingly, your timings and import sequence is completely different from mine.

Did you use python3.7 -X importtime -c "import matplotlib.pyplot as plt; plt.use('agg')" 2> matplotlib.log like the old examples? I did.

What's more disturbing is that were now 0.1-0.2s slower overall. But execution timing is a difficult topic and I don't want to go that rabbit hole here for now.

@timhoffm timhoffm deleted the speedup-import branch July 19, 2024 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants