-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Reuse single kpsewhich instance for speed. #19531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
203ddf6
to
9f3da44
Compare
On MacOS, where spawing kpsewhich instances is rather slow, this appears to speed up ``` python -c 'from pylab import *; mpl.use("pdf"); rcParams["text.usetex"] = True; plot(); savefig("/tmp/test.pdf", backend="pdf")' ``` around two-fold (~4s to ~2s). (There's also a small speedup on Linux, perhaps ~10%, but the whole thing is already reasonably fast.) Note that this is assuming that the dvi cache has already been built; the costly subprocess calls here are due to calls to kpsewhich to resolve the fonts whose name are listed in the dvi file. Much of the complexity here comes from the need to force unbuffered stdin/stdout when interacting with kpsewhich (otherwise, things just hang); this is also the reason why this is not implemented on Windows (Windows experts are welcome to look into this...; there, the speedup should be even more significant). (On Linux, another solution, which does not require a third-party dependency, is to call `stdbuf -oL kpsewhich ...` and pass bufsize=0 to Popen(), but `ptyprocess` is pure Python so adding a dependency seems reasonable). The `format` kwarg to `find_tex_file` had never been used before, and cannot be handled in the single-process case, so just deprecate it.
9f3da44
to
737c80e
Compare
This is heavy machinery. I assume we cannot get away cheaply by collecting the requests first and send them to |
I think that would request quite a bit of reworking of the innards of dviread :( |
Fair enough. Just wanted to make sure we're not overlooking an easier solution. |
I guess the other solution would be to revive @jkseppan's series of PRs #10236, #10238, #10268. From my PoV these PRs (while likely actually implementing a better solution) basically died from the use of sqlite as cache format, which is reputedly something really useful to know but which I know nothing about :-( Edit: Yet another idea would be to call |
I spent some more time looking at this problem. I can think of two other solutions (which I guess I'm mostly writing for my own reference, but heh :-)): Solution 1: Plain (e)TeX actually has a way to directly query glyph sizes:
and TeX actually appears to flush output, which makes it controllable via Solution 2: luatex embeds and exposes kpathsea, and can likewise be used interactively
(the |
Is luatex a big burden? I think most major dists have it now, don't they? |
See #19551, which implements the luatex-based solution (with some encoding-related wrinkles still left, but it's mostly there). |
Mostly superseded by #19558; we can reopen if there's appetite for a solution that specifically doesn't require luatex. |
On MacOS, where spawing kpsewhich instances is rather slow (#4880 (comment)), this appears
to speed up
around two-fold (~4s to ~2s). (There's also a small speedup on Linux,
perhaps ~10%, but the whole thing is already reasonably fast.)
Note that this is assuming that the dvi cache has already been built;
the costly subprocess calls here are due to calls to kpsewhich to
resolve the fonts whose name are listed in the dvi file.
Much of the complexity here comes from the need to force unbuffered
stdin/stdout when interacting with kpsewhich (otherwise, things just
hang); this is also the reason why this is not implemented on Windows
(Windows experts are welcome to look into this...; there, the speedup
should be even more significant). (On Linux, another solution, which
does not require a third-party dependency, is to call
stdbuf -oL kpsewhich ...
and pass bufsize=0 to Popen(), butptyprocess
is pure Python so adding a dependency seems reasonable).The
format
kwarg tofind_tex_file
had never been used before, andcannot be handled in the single-process case, so just deprecate it.
Edit: See #19558 for another approach, which also works on Windows for a large speedup. I'll keep this PR as separate for now to allow comparing the various approaches.
PR Summary
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).