Skip to content

Implement TeX's fraction alignment for Mathtext fractions #22852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tfpf
Copy link
Contributor

@tfpf tfpf commented Apr 15, 2022

PR Summary

Follow-up to #20627, which somehow got closed automatically when I was messing around with merge conflicts in my fork. I have implemented the same algorithm TeX uses to align fractions.

At the moment, I have not updated the images, for reasons I will mention in a comment.

Fixes #18086. Fixes #18389. Fixes #22172.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • [N/A] New features are documented, with examples if plot related.
  • [N/A] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@tfpf
Copy link
Contributor Author

tfpf commented Apr 15, 2022

A few notes.

Algorithm

To best of my knowledge, I tried to use the algorithm described in TeX: The Program by Don Knuth. It describes numerators and denominators as being shifted independently of each other, whereas Matplotlib packs them (along with a rule vertical spaces) into a box, and moves the box down. As a result, some acrobatics were necessary, so there is scope of optimisation, but I focused on getting it to work first.

fractions-with-without-rule
Fractions with and without rules: text style on the left, and display style on the right.

And yes, the spacing does take into account the thickness of the rule, as TeX does (fix #22172).
image

Hopefully, it will also fix #18389.

Font Constants

In order to use the same algorithm, I defined five new constants in the class FontConstantsBase, and used them to mean the same thing TeX does. (On an unrelated note, the comments describing these constants refer to them as 'percentages', but I think they should be called 'multipliers', since they are direct multipliers. Should the comments be changed?)

I chose values which best rendered fractions while using DejaVu Sans. Different values may be needed for the other four typefaces.

Possible Bug in Fraction Rendering

        num_clr = max(
            num_shift_up - cnum.depth - axis_height - rule / 2, min_clr)
        den_clr = max(
            axis_height - rule / 2 + den_shift_down - cden.height, min_clr)
        vlist = Vlist([cnum,                     # numerator
                       Vbox(0, num_clr - rule),  # space
                       Hrule(state, rule),       # rule
                       Vbox(0, den_clr + rule),  # space
                       cden                      # denominator
                       ])

This is the block of code which defines the clearance between the rule and the numerator/denominator. num_clr is the vertical distance between the numerator and the rule. den_clr is that between the denominator and the rule. If a Vlist is created with num_clr and den_clr, the actual rendered distances turn out to be num_clr + ruleand den_clr - rule. This looks like a bug to me, so I undid its effects by specifying num_clr - rule and den_clr + rule. The actual rendered distances then become num_clr - rule + rule = num_clr and den_clr + rule - rule = den_clr, which matches what TeX would do.

If the bug gets fixed in the future, all we'd need to do is remove - rule and + rule from the above code block.

Updating Images

The choice of font constants may work for DejaVu Sans, but probably won't work for the others. I think images should be added only after we have a consensus on what those values should be.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have it, including a reference to the TeX algorithm would be great.

@tfpf
Copy link
Contributor Author

tfpf commented Apr 15, 2022

If you have it, including a reference to the TeX algorithm would be great.

Sure. I'll add a comment with the appropriate references.

@tfpf
Copy link
Contributor Author

tfpf commented Apr 15, 2022

I have added font constants for STIX, STIX sans, DejaVu Serif and CM. The values chosen are the nearest integral multiples of 0.1 for which the numerator and denominator seen in the images in this comment do not have to be shifted beyond their default shift amounts.

If that sounds okay, I will try to update the images as well.

@anntzer
Copy link
Contributor

anntzer commented Apr 16, 2022

Sorry to hijack this PR (which looks nice, although I haven't checked the exact algorithms yet), but perhaps we can revisit #19201 (smaller svg baselines by not embedding fonts) as well? (at least the part switching the svg baselines to not embedding the fonts, for a significant gain in size, even if we keep the png and pdf baselines still around for now.) #19201 was deferred on the expectation of switching to a completely separate baseline images repo, but I'm not sure this is going to happen soon...
(I can extract the relevant parts of #19201 into a new PR, of course.)

@tfpf
Copy link
Contributor Author

tfpf commented Apr 20, 2022

That is a good idea (as I understand, fewer changes to the images will result in a smaller size increase of the repo), but I am really not sure how to do it. Perhaps someone else could push the image change commits here?

@tfpf tfpf changed the title Implement TeX's fraction alignment Implement TeX's fraction alignment for Mathtext fractions Apr 20, 2022
@tfpf
Copy link
Contributor Author

tfpf commented Apr 21, 2022

Noticed something while fixing the merge conflict. Mathtext adds a space to the right of each fraction.

    result = [Hlist([vlist, Hbox(thickness * 2.)])]

Is there a reason for this? Fractions look better without that space:
without_space

than with it.
with_space

Brackets added to make the difference clear.

@anntzer
Copy link
Contributor

anntzer commented Apr 23, 2022

Agreed it looks better without the shift; a quick git blame suggests this comes from the original implementation (c10b4bb) with no reason given :-(

@anntzer
Copy link
Contributor

anntzer commented Apr 23, 2022

I put up a revival of #19201 at #22881, let's discuss that there.

@tfpf
Copy link
Contributor Author

tfpf commented Apr 25, 2022

from the original implementation with no reason given

That is unfortunate.😐 At a glance, it appears that this space was added so that the lines of two consecutive fractions (e.g. \dfrac{1}{2}\dfrac{1}{2}) do not look like a single line. But that raises the question why the space was not added on both sides like this:

    result = [Hlist([Hbox(thickness * 1.), vlist, Hbox(thickness * 1.)])]

instead of just on the right.

Although it's just a single change, I think this deserves a separate PR?

@anntzer
Copy link
Contributor

anntzer commented Apr 25, 2022

Seems like a reasonable guess as to why this came in. Do you have any idea of what TeX does to avoid the same problem?

@tfpf
Copy link
Contributor Author

tfpf commented Apr 25, 2022

From a quick look at the book, node 748 seems to suggest that the Hlist itself does not contain any extra space. Perhaps a space is added only if there is something before or after the fraction? I have no clue.

@tfpf
Copy link
Contributor Author

tfpf commented May 2, 2022

Do you have any idea of what TeX does to avoid the same problem?

I think I've got it. Per node 764 of the book, all possible inter-noad space codes are first stored in a 64-element string, which is treated as if it were an 8-by-8 matrix. The space between two noad types a and b is computed using the element in the ath row and bth column of the matrix, along with what is called a magic_constant. (I think that's the general idea.)

Here, a and b are numbers representing the noad type. A noad is an element of an mlist (which is a linked list representing formulae). Hence, the space between two fractions is the 8 * fraction_noad + fraction_noad + magic_offsetth element of the matrix.

Is this beyond the scope of Mathtext? (Will there be any side-effects of making this change.)

@jklymak
Copy link
Member

jklymak commented May 16, 2022

Moving to draft. Feel free to move back when the relevant tests pass!

@jklymak jklymak marked this pull request as draft May 16, 2022 08:31
@anntzer
Copy link
Contributor

anntzer commented Jun 11, 2022

@tfpf I think best would be to perhaps not encode the entire 64 internoad spaces for now (unless you want to go for it) and just specifically encode an inter-fraction noad? Then perhaps squash this together with #21850 and accept the mass changes in baseline images (although I would much prefer if #22881 could go in first, too, and switch all the changed images to svg-unembedded-glyphs-only).

@tfpf
Copy link
Contributor Author

tfpf commented Jun 11, 2022

just specifically encode an inter-fraction noad

As far as I can tell, Mathtext's spacing algorithm is not the same as that of TeX. Mathtext adds spaces to Hlist objects. I think it might be simpler to just put equal spaces on both sides of the fraction.

switch all the changed images to svg-unembedded-glyphs-only

Agreed, that is the most meaningful thing to do. I will hold off on changing any images until that pull request is merged.

@tfpf
Copy link
Contributor Author

tfpf commented Jun 13, 2022

Then perhaps squash this together

@anntzer Does this mean: add the subscript and superscript changes here? This would result in one less change for some images, because there are some tests which have fractions as well as superscripts and subscripts.

However, I had to cut some corners for the latter. I'll paste the details here.

Shortcuts I took for the algorithm

  • The default values of shift_up and shift_down depend on two font constants, supdrop and subdrop, respectively. However, Mathtext uses subdrop for both (as the comment in the FontConstantsBase class mentions), so I did the same.
  • The actual value of shift_up depends on the current style (display, text, script or script-script), and may be calculated using sup1, sup2 or sup3. However, superscripts and subscripts in Mathtext do not know about the current style (Mathtext only has the sup1 constant); they are simply one size smaller than the nucleus. I've also used only sup1.

Also, if we are to use the TeX algorithm for superscripts and subscripts, the "poor man's x-height" calculation (#21850 (comment)) may have to be forced for all fonts. (If there is a better way, I am all for it, though.)

@anntzer
Copy link
Contributor

anntzer commented Jun 14, 2022

Does this mean: add the subscript and superscript changes here? This would result in one less change for some images, because there are some tests which have fractions as well as superscripts and subscripts.

Yes.

Also, if we are to use the TeX algorithm for superscripts and subscripts, the "poor man's x-height" calculation (#21850 (comment)) may have to be forced for all fonts. (If there is a better way, I am all for it, though.)

IIRC(?) there are other places where we noted that font-reported-x-heights can be useless and it may be better to use poor man's x-height everywhere instead, so I wouldn't be too annoyed by that change.


The shortcuts you mention seem fine.

@tfpf tfpf force-pushed the mathtext-vertical-align branch 2 times, most recently from 0e7bf54 to b65d493 Compare June 16, 2022 13:24
@tfpf
Copy link
Contributor Author

tfpf commented Jun 20, 2022

Of the 537 Mathtext image comparison tests failed, there were some that failed for Computer Modern, but not for the other fonts. I moved all the failing tests to svgastext_math_tests, but since each test checks all five fonts, a total of 720 new images have to added.:sweat_smile:

There are a few errors I don't quite understand.

  • r'$\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+x}}}}}}}$' (There are neither fractions nor superscripts or superscripts. And yet, this image failed the image comparison test.)
  • r"$f'\quad f'''(x)\quad ''/\mathrm{yr}$" (Looks like apostrophes affected by the superscript positioning.)
  • test_operator_space (In this check_figures_equal test, the 's' of 'cos' is displaced slightly compared to the reference image, even though 'co' is placed identically. This is strange, because operator kerning should have remained unchanged.)

Also, I can't quite figure out how to identify the old images to be deleted.

@tfpf
Copy link
Contributor Author

tfpf commented Jun 20, 2022

Of the 537 Mathtext image comparison tests failed, there were some that failed for Computer Modern, but not for the other fonts. I moved all the failing tests to svgastext_math_tests, but since each test checks all five fonts, a total of 720 new images have to added.:sweat_smile:

This would mean that some images can simply be renamed instead of being deleted and added back … if one could sift through all of them. All-in-all, this is a probably too big a change, so I'll be happy to close it if it doesn't work out.

@anntzer
Copy link
Contributor

anntzer commented Jun 20, 2022

Sorry, I slightly messed up the "svg lightweight tests" change. Can you additionally include the two lines at https://github.com/matplotlib/matplotlib/pull/23270/files#diff-17f9692b77108a62b59c71825d8461488729ef280f56785d3ad55ce3a86044d8R210 (as that doesn't look like it's getting merged any time soon) to your PR? This way only svg will be tested, and only for dejavusans and cm; you can then remove the other baselines.

I don't think(?) there's a really smart way to figure out what images can be deleted other than by manually checking... (Probably it would be nice to have some tooling for that.)

@tfpf
Copy link
Contributor Author

tfpf commented Jun 21, 2022

Sure, I will update it.

@tfpf
Copy link
Contributor Author

tfpf commented Jun 23, 2022

Thanks a lot for the patient and detailed analysis! I think I miscalculated the font constant updates for CM. Further, I hadn't changed delta to account for the changed x-height (as indicated by the kerning before '2'). If I fix those two things, maybe it'll work.

Also, note that the kerns before the "c" are actually different in the two supposedly identical examples (is that a bug?)

Definitely looks like one. Might be worth a closer look.

@tfpf
Copy link
Contributor Author

tfpf commented Jun 24, 2022

Also, note that the kerns before the "c" are actually different in the two supposedly identical examples (is that a bug?)

The space before 'c' is \,, which is 1/6 em. Looking at the definition of TruetypeFonts._get_info, it looks like k2.44 and k2.31 are both 1/6 em, but the former uses an italicised 'm', whereas the latter uses a regular 'm'.

the kern before the "2" changed. (Why?)

This can be fixed by increasing the CM constants by the correct multiplying factor, but the problem is that the multiplying factor depends on the ratio of the old x-height and the actual x-height (as returned by get_metrics). The actual x-height, in turn, depends on the hinting type. The function TruetypeFonts._get_info loads the metrics using the hinting type. I am not clear on what that exactly is, but if the hinting type is 2, the x-height of CM at font size 12 with 100 DPI is 7.375, whereas if the hinting type is 32, the x-height is 8 at the same font size and DPI.

I will go ahead and use the latter to calculate the multiplying factor, since that fixes test_operator_space and uses the same kerning as earlier.

@tfpf tfpf force-pushed the mathtext-vertical-align branch from 702e794 to 3f6ed6c Compare June 24, 2022 12:31
@tfpf tfpf force-pushed the mathtext-vertical-align branch from 22faffc to e3b1ec4 Compare July 10, 2022 08:39
@tfpf tfpf marked this pull request as ready for review July 10, 2022 12:47
@tfpf
Copy link
Contributor Author

tfpf commented Jul 10, 2022

PR cleanliness failed.

If these changes are acceptable, I will squash the commits.

@anntzer
Copy link
Contributor

anntzer commented Jul 10, 2022

Previously, the two tests

    r'$xyz^kx_kx^py^{p-2} d_i^jb_jc_kd x^j_i E^0 E^0_u$',  # github issue #4873
    r'${xyz}^k{x}_{k}{x}^{p}{y}^{p-2} {d}_{i}^{j}{b}_{j}{c}_{k}{d} {x}^{j}_{i}{E}^{0}{E}^0_u$',

rendered the same.

With this PR, some of the exponents are shifted between the two cases. I think this is not desirable?

@tfpf tfpf marked this pull request as draft July 11, 2022 04:51
@tfpf
Copy link
Contributor Author

tfpf commented Jul 11, 2022

Probably a result of the difference between {something} and something, as discussed in #23257. I'll try to figure it out.

@tfpf
Copy link
Contributor Author

tfpf commented Jul 15, 2022

This time, I double-checked that x^2_y and x_y^2 are rendered identically, as are xyz^kx_kx^py^{p-2} d_i^jb_jc_kd x^j_i E^0 E^0_u and {xyz}^k{x}_{k}{x}^{p}{y}^{p-2} {d}_{i}^{j}{b}_{j}{c}_{k}{d} {x}^{j}_{i}{E}^{0}{E}^0_u.

Another thing is that {a}_{0}+\dfrac{1}{{a}_{1}+\dfrac{1}{{a}_{2}+\dfrac{1}{{a}_{3}+\dfrac{1}{{a}_{4}}}}} gets cut off at the top and bottom (the size of the figure isn't enough accommodate everything).

image
image

@tfpf tfpf marked this pull request as ready for review July 15, 2022 15:54
@tfpf tfpf force-pushed the mathtext-vertical-align branch from 2693862 to 02dff6a Compare July 17, 2022 08:28
As described in *TeX: the Program* by Don Knuth.

New font constants are set to the nearest integral multiples of 0.1 for which numerators and denominators containing normal text do not have to be shifted beyond their default shift amounts at font size 30 in display and text styles. To better process superscripts and subscripts, the x-height is now always calculated instead of being retrieved from the font table (which was the case for Computer Modern); the affected font constants have been changed.

Mathtext tests which failed as a result of these changes have been moved from `math_tests` to `svgastext_math_tests`. A duplicate test was also fixed in the process.
@QuLogic
Copy link
Member

QuLogic commented Aug 10, 2022

There are a lot of images changed here, but because they're renamed, it's difficult to really see what effect this change has on any of them. And the ones that are left with a diff sometimes look worse...

Plus some of the added images just don't look right at all:
image
(presumably due to missing fonts locally?)

If we want to combine this change with switch to SVG text, then we can do that after review, or just in an entirely separate commit/PR.

@tfpf tfpf marked this pull request as draft August 10, 2022 05:06
@tfpf
Copy link
Contributor Author

tfpf commented Aug 10, 2022

(presumably due to missing fonts locally?)

This is probably the reason, since Inkscape is able to correctly convert the SVG images to PNG. For instance, this is mathtext0_cm_07_svg.png.
image

If we want to combine this change with switch to SVG text, then we can do that after review, or just in an entirely separate commit/PR.

I guess moving to SVG in a separate PR will be easier to manage.

@anntzer
Copy link
Contributor

anntzer commented Aug 18, 2022

The fact that the svg cm misrenders on a webbrowser is normal: this is because it is specifically tailored to use matplotlib's cm font files, which are the "classical" ones from AMS and use a nonstandard encoding: unless you specifically point the the right font files (which our _SVGConverter does), you won't get the right glyphs. So the idea is really to just look at the dejavu svgs; if you want to confirm that the cm svgs are correct you should run the tests and visualize the files converted to png in result_images/.

@tfpf
Copy link
Contributor Author

tfpf commented Nov 23, 2022

Also, note that the kerns before the "c" are actually different in the two supposedly identical examples (is that a bug?)

The space before 'c' is ,, which is 1/6 em. Looking at the definition of TruetypeFonts._get_info, it looks like k2.44 and k2.31 are both 1/6 em, but the former uses an italicised 'm', whereas the latter uses a regular 'm'.

the kern before the "2" changed. (Why?)

This can be fixed by increasing the CM constants by the correct multiplying factor, but the problem is that the multiplying factor depends on the ratio of the old x-height and the actual x-height (as returned by get_metrics). The actual x-height, in turn, depends on the hinting type. The function TruetypeFonts._get_info loads the metrics using the hinting type. I am not clear on what that exactly is, but if the hinting type is 2, the x-height of CM at font size 12 with 100 DPI is 7.375, whereas if the hinting type is 32, the x-height is 8 at the same font size and DPI.

I am still unable to figure this out, after all this time. Here's an example which demonstrates the problem. I made the following changes:

diff --git a/lib/matplotlib/_mathtext.py b/lib/matplotlib/_mathtext.py
index 3a934c21fd..93df6c4f05 100644
--- a/lib/matplotlib/_mathtext.py
+++ b/lib/matplotlib/_mathtext.py
@@ -334,6 +334,10 @@ class TruetypeFonts(Fonts):
         font = self._get_font(fontname)
         font.set_size(fontsize, dpi)
         pclt = font.get_sfnt_table('pclt')
+        print(f'{fontname=}, {fontsize=}', end=', ')
+        if pclt is not None:
+            print('Reported:', (pclt['xHeight'] / 64.0) * (fontsize / 12.0) * (dpi / 100.0), end=', ')
+        print('Actual:', self.get_metrics(fontname, mpl.rcParams['mathtext.default'], 'x', fontsize, dpi).iceberg)
         if pclt is None:
             # Some fonts don't store the xHeight, so we do a poor man's xHeight
             metrics = self.get_metrics(

and created the file test.py.

import matplotlib.pyplot as plt
plt.rcdefaults()
plt.rc('mathtext', fontset='cm')
plt.rc('font', size=12)
plt.figure().text(.5, .5, r'$x_1^2$')
plt.show()

Now comes the strange part.

$ python3 test.py 
fontname='it', fontsize=12.0, Reported: 14.171875, Actual: 8.0
$ pytest -s lib/matplotlib/tests/test_mathtext.py::test_mathtext_rendering[png-mathtext-cm-12]
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/tfpf/Documents/projects/matplotlib, configfile: pytest.ini
plugins: xdist-2.5.0, forked-1.4.0
collected 1 item                                                               

lib/matplotlib/tests/test_mathtext.py fontname='it', fontsize=12.0, Reported: 14.171875, Actual: 7.375
fontname='it', fontsize=12.0, Reported: 14.171875, Actual: 7.375
.

============================== 1 passed in 0.77s ===============================

The x-height changes, presumably because the hinting is different in the tests. (Also, the ratio of the reported x-height height to the actual x-height of CM is not constant across font sizes.) I'll shelve this for a time I am better equipped to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants