Page MenuHomePhabricator

Fix the TexVC(PHP) Parse tree related cases
Closed, ResolvedPublic

Description

Fix the TexVC Parse tree since it is not consistent with MathML Parsetree, texVC grammar has to be adapted.

The parse tree in TexVC(PHP) is currently in many cases not enough to create valid MathML, example cases can be found in this test-file in texvctreebugs, the ultimate solution is to refactor the grammar file so that the parse tree by TexVC(PHP) is correct for generating MathML.

I uploaded a html-file here which has the MathML of the erroneous cases. Looking at the MathML for sideset is a start to understand the types of errors:

Event Timeline

Physikerwelt triaged this task as Medium priority.

We need to look into that case by case and decide whether we want to update texvcjs. In the first step, I will review the JSON file, and comment what I think is correct or incorrect.

Change 889094 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix Grammar for Parsetree for ...

https://gerrit.wikimedia.org/r/889094

Change 889281 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix Grammar for case ...

https://gerrit.wikimedia.org/r/889281

TC-Index is always related to index in "MMLGenerationTexUtilTestLocal.php"

Case by case check 1: " color-cases"

TC-Index: 90
Tex: "\color{red}{red}" edit: this format is not directly valid. Appending characters are also red (in all interpretation

Edit:

  • first solution for usage is to add testcases here "a {b \color{red} c} d" here (new solutions then to be defined)

Similar cases (with issue):

  • "\pagecolor{red}{red}"
  • "\definecolor{ultramarine}{RGB}{0,32,96}"

Similar cases (already solved issue):

  • "\cfrac{a}{b}"

MML-Mathoid(Reference):

<mrow class="MJX-TeXAtom-ORD">
  <mstyle displaystyle="true" scriptlevel="0">
    <mstyle mathcolor="red">
      <mrow class="MJX-TeXAtom-ORD">
        <mi>r</mi>
        <mi>e</mi>
        <mi>d</mi>
      </mrow>
    </mstyle>
  </mstyle>
</mrow>

Parsetree-TexVC(Currently):

  • TexArray
    • Literal(arg="\color{red}")
    • Curly - TexArray(Literal("r"),Literal("e"),Literal("d")

Issue:

  • In ParseTree TexVC there are two non-related Elements with relevant information
  • the preceding Literal(arg="red") element is not related to the subsequent Curly element (curly could alsocontain "green" or something else which should then be highlighted red")

Solution (draft):

  • ideally the parse tree is nested with Color(red)[Curly(the text)]

@Physikerwelt what do you think of this case and the solution draft ?

Case by case check 2: "non-squashed literals case"

TC-Index: 39 not completely currently its mathfrak{a}
Tex: "\mathfrak{abcde}"
Similar cases (with issue):

  • \mathit{a} and all follow up cases 39++
  • it seems to be all "is_lettermod" cases
  • "\alpha\,\!" from FullCoverageTest and similar cases should be considered (multiple commands in one statement)
  • ""\\exp_a b = a^b, \\exp b = e^b, 10^m \\!";"

Similar cases (already solved issue):

MML-Mathoid(Reference): (tbd just a guess)

<mrow data-mjx-texclass="ORD">
   <mrow data-mjx-texclass="ORD">
     <mi mathvariant="fraktur">abcde</mi>
   </mrow>
 </mrow>

MML-TexVC(Currently):

<mrow data-mjx-texclass="ORD">
  <mrow data-mjx-texclass="ORD">
    <mi mathvariant="fraktur">a</mi>
    <mi mathvariant="fraktur">b</mi>
    <mi mathvariant="fraktur">c</mi>
    <mi mathvariant="fraktur">d</mi>
    <mi mathvariant="fraktur">e</mi>
  </mrow>
</mrow>

Parsetree-TexVC(Currently):

  • TexArray-Fun1-Curly-Texarray(Literal("a), Literal("b"), Literal("c) ... )

Issue:

  • When parsing text annotating nested elements, all literals within these elements are processed char-by-char

Solution (draft):

  • the literals get squashed in TexVC grammar to MML element

Remarks:

  • Currently the rendered version looks ok of mathfrak{abcde}, but mathml or speech could be problematic from such representation

Fix Draft:

Case by case check 3: "sideset"

TC-Index: 90
Tex: "\sideset{_1^2}{_3^4}\sum"
Similar cases (with issue):

  • succeeding elements: sum is a "nullary_macro", so all nullary macros ??
    • \sideset{_1^2}{_3^4}\supset in example is rendered by TEMML online converter
  • Preceding elements: besides sideset ?

Similar cases (already solved issue):

MML-Mathoid(Reference):

<mrow class="MJX-TeXAtom-OP">
  <msubsup>
    <mrow class="MJX-TeXAtom-OP MJX-fixedlimits">
      <mrow class="MJX-TeXAtom-ORD">
        <mpadded width="0">
          <mrow class="MJX-TeXAtom-ORD">
            <mphantom>
              <mo>&#x2211;</mo>
            </mphantom>
          </mrow>
        </mpadded>
      </mrow>
    </mrow>
    <mrow class="MJX-TeXAtom-ORD">
      <mn>1</mn>
    </mrow>
    <mrow class="MJX-TeXAtom-ORD">
      <mn>2</mn>
    </mrow>
  </msubsup>
  <mspace width="negativethinmathspace"/>
  <msubsup>
    <mrow class="MJX-TeXAtom-OP MJX-fixedlimits">
      <mo movablelimits="false">&#x2211;</mo>
    </mrow>
    <mrow class="MJX-TeXAtom-ORD">
      <mn>3</mn>
    </mrow>
    <mrow class="MJX-TeXAtom-ORD">
      <mn>4</mn>
    </mrow>
  </msubsup>
</mrow>

MML-TexVC(Currently):

 <mrow data-mjx-texclass="OP">
  <mmultiscripts data-mjx-script-align="left">
    <mrow data-mjx-texclass="ORD">
      <mn>3</mn>
    </mrow>
    <mrow data-mjx-texclass="ORD">
      <mn>4</mn>
    </mrow>
    <mprescripts/>
    <mrow data-mjx-texclass="ORD">
      <mn>1</mn>
    </mrow>
    <mrow data-mjx-texclass="ORD">
      <mn>2</mn>
    </mrow>
  </mmultiscripts>
</mrow>
<mo data-mjx-texclass="OP">&#x2211;</mo>

Parsetree-TexVC(Currently):

  • TexArray - ( Fun2nb ("with all other stuff"), Literal("Sum"))

Issue:

  • The \sum element at the end of tex is an unrelated operator in the parsetree in TexVCPHP to the preceding sideset element

Solution (draft):

  • One: Sum and similar elements are nested in the elements when sideset (or similar tbd-elements) are preceding https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/889611/2
  • Two: (non-grammar-change) In TexArray is checked wheter it has 'compound' elements, parameters are passed as configuration or TexArray is re-arranged , state is passed so elements are parsed correctly -> take this one

Case by case check 4: "limits-case"

TC-Index: 411
Tex: "\lim\limits_{x \to 2}"
Similar cases (with issue):

  • i462: nolimits: \mathop{\rm cos}\nolimits^2 ???

Similar cases (already solved issue):

  • limits_{x \to 2} renders correctly

MML-Mathoid(Reference):

 <mrow class="MJX-TeXAtom-ORD">
  <mstyle displaystyle="true" scriptlevel="0">
    <munder>
      <mo form="prefix">lim</mo>
      <mrow class="MJX-TeXAtom-ORD">
        <mi>x</mi>
        <mo stretchy="false">&#x2192;</mo>
        <mn>2</mn>
      </mrow>
    </munder>
  </mstyle>
</mrow>

Parsetree-TexVC(Currently):

  • Texarray
    • Literal ("Lim")
    • DQ
      • base:Literal("limits")
      • down: Curly (Literal("x"),Literal("\to"),Literal("2")

For "\sum\limits_{j=1}^k" the tree is similar, but FQ instead of DQ.

Issue:

  • lim_ not recognized correctly
  • compound [\lim | \sum | \prod ] \limits elements are not compound elements of the parsetree, this creates problem generating MML output

Solution (draft):

Case differentiation:

  • lim_{x \to 2} add testcase which renders correctly
  • \lim\limits
  • \sum\limits
  • \prod\limits
  1. Solution 1: Simple solution could be to just to implicitly assume "lim" element when DQ("base") contains "limits" (does limits always imply lim ?). And forward a state from TexArray that Literal("lim") is not processed
  2. Solution 2: Make a node element and grammar condition which specifially recognizes limits based constructs (practically this could also be something like "Fun3"
  3. Solution 3: Texarray based check and state forwarding (similar to case 3 sideset) -> take this one

Additional Info:

Re: Case by case check 1: " color-cases"

Relatedness

There are two issues:

  1. How to define a color?
  2. How to use a color?

1 is hard 2 is medium. Case 1 is about 2, correct?

Moreover, I don't see how cfrac is related. Can you elaborate.

Solution

This needs more elaboration. Do you want to move the \color command to another class in texutil?

Currently color is defined as

	"\\color":
		"color_function": true,
		"color_required": true,
		"mhchem_macro_2pc": true

Which brings me to the question. How does the parsetree look for \ce{\color{red}{x}} in chem mode? Maybe adding "fun_ar2": true to the json file would do the job?

Re: Case 2

Currently MathJax generates

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block" alttext="{\mathfrak {abcde}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mrow class="MJX-TeXAtom-ORD">
        <mi mathvariant="fraktur">a</mi>
        <mi mathvariant="fraktur">b</mi>
        <mi mathvariant="fraktur">c</mi>
        <mi mathvariant="fraktur">d</mi>
        <mi mathvariant="fraktur">e</mi>
      </mrow>
    </mrow>
    <annotation encoding="application/x-tex">{\mathfrak {abcde}}</annotation>
  </semantics>
</math>

https://mathoid-beta.wmflabs.org/info.html

I would therefore exclude that from the initial release and keep it for later.

Re Case 3

LaTeXML generates:

<math xmlns="http://www.w3.org/1998/Math/MathML" id="p1.1.m1.1" class="ltx_Math" alttext="\sideset{{}_{1}^{2}}{{}_{3}^{4}}{\sum}" display="inline"><semantics id="p1.1.m1.1a"><mmultiscripts id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml"><mo id="p1.1.m1.1.1.2.2.2.2" xref="p1.1.m1.1.1.2.2.2.2.cmml">∑</mo><mn id="p1.1.m1.1.1.2.3" xref="p1.1.m1.1.1.2.3.cmml">3</mn><mn id="p1.1.m1.1.1.3" xref="p1.1.m1.1.1.3.cmml">4</mn><mprescripts id="p1.1.m1.1.1a" xref="p1.1.m1.1.1.cmml"/><mn id="p1.1.m1.1.1.2.2.3" xref="p1.1.m1.1.1.2.2.3.cmml">1</mn><mn id="p1.1.m1.1.1.2.2.2.3" xref="p1.1.m1.1.1.2.2.2.3.cmml">2</mn></mmultiscripts><annotation-xml encoding="MathML-Content" id="p1.1.m1.1b"><apply id="p1.1.m1.1.1.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1.1.1.1.cmml" xref="p1.1.m1.1.1">superscript</csymbol><apply id="p1.1.m1.1.1.2.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1.1.1.2.1.cmml" xref="p1.1.m1.1.1">subscript</csymbol><apply id="p1.1.m1.1.1.2.2.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1.1.1.2.2.1.cmml" xref="p1.1.m1.1.1">subscript</csymbol><apply id="p1.1.m1.1.1.2.2.2.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1.1.1.2.2.2.1.cmml" xref="p1.1.m1.1.1">superscript</csymbol><sum id="p1.1.m1.1.1.2.2.2.2.cmml" xref="p1.1.m1.1.1.2.2.2.2"/><cn type="integer" id="p1.1.m1.1.1.2.2.2.3.cmml" xref="p1.1.m1.1.1.2.2.2.3">2</cn></apply><cn type="integer" id="p1.1.m1.1.1.2.2.3.cmml" xref="p1.1.m1.1.1.2.2.3">1</cn></apply><cn type="integer" id="p1.1.m1.1.1.2.3.cmml" xref="p1.1.m1.1.1.2.3">3</cn></apply><cn type="integer" id="p1.1.m1.1.1.3.cmml" xref="p1.1.m1.1.1.3">4</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="p1.1.m1.1c">\sideset{{}_{1}^{2}}{{}_{3}^{4}}{\sum}</annotation><annotation encoding="application/x-llamapun" id="p1.1.m1.1d">SUPERSCRIPTOP SUBSCRIPTOP SUBSCRIPTOP SUPERSCRIPTOP start_ARG ∑ end_ARG 2 1 3 4</annotation></semantics></math>

I guess we should aim for mmultiscripts here as well?

The syntax for mmultiscripts is mmultiscripts base 3 4 mprescripts 1 2, cf https://developer.mozilla.org/en-US/docs/Web/MathML/Element/mmultiscripts

I am not exactly sure how to change the grammar. I would recommend inverstigation of all commands in the fun2nb class (maybe this are not that many)? Do we want a dedicated ticket for this?

Re: Case by case check 1: " color-cases"

1 is hard 2 is medium. Case 1 is about 2, correct?

yes.
i think the solution (when modifying grammar) is also about 2 because the grammar parsing definecolor statement is around the same "\definecolor{ultramarine}{RGB}{0,32,96}"

Moreover, I don't see how cfrac is related. Can you elaborate.

It is a hint for the correct format of parsetree (related by structure maybe).

Which brings me to the question. How does the parsetree look for \ce{\color{red}{x}} in chem mode?

Parsetree looks like

  • Mhchem(fname="\ce")
    • Curly
      • left: Fun2 -> Literal("red"), Curly(TexArray(Chemword(left: "x", right: ""))),
      • right: Literal("")

Maybe adding "fun_ar2": true to the json file would do the job?

quick check shows it does not, parse-tree look the same as before

  1. Re: Case 2

I would therefore exclude that from the initial release and keep it for later.

Agree with this, squashing seems complex in the grammar and also can be error-prone because of varying delimiters for each squashed elements for literals.
However, solution draft can somewhere be saved here and then re-used or grammar adapted: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/889281

Re Case 4

I think limits is quite special and I would therefore recommend a specific handling of the parse tree in a postprocessing step.

In particular

Texarray

Literal ("Lim")
DQ
    base:Literal("limits")
    down: Curly (Literal("x"),Literal("\to"),Literal("2")
NEXT TOKEN

should become

Texarray

Literal ("Lim")
(D/U/F)Q (option limits)
    base:NEXT TOKEN
    down: Curly (Literal("x"),Literal("\to"),Literal("2")

However, solution draft can somewhere be saved here and then re-used or grammar adapted: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/889281

Did you test if tests (enwikiformulae) fail? I would expect that there is quite a significant amount of cases...

However, solution draft can somewhere be saved here and then re-used or grammar adapted: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/889281

Did you test if tests (enwikiformulae) fail? I would expect that there is quite a significant amount of cases...

i checked with full-coverage test here to have an initial overview, from the look they seem ok after some optimizations, but its still a lot of cases togo.

For Case 1

I suggest not to change the grammar and use options in a postprocessing step (like suggested for step 4)

This means

TexArray

  Literal(arg="\color{red}")
  Curly - TexArray(Literal("r"),Literal("e"),Literal("d")

Becomes

TexArray (color red)

  Curly - TexArray(Literal("r"),Literal("e"),Literal("d")

After investigating all the cases, I would not recommend changing the grammar at all. I think solutions within the MathML rendering phase are preferable. I suggest to close this issue and discuss the individual cases in subsequent tickets.

Stegmujo renamed this task from Fix the TexVC(PHP) Parse tree generation to Fix the TexVC(PHP) Parse tree related cases.Feb 24 2023, 4:05 PM
Stegmujo reopened this task as Open.

sideset can be realized with multiscript, not use mathoid reference

Change 892486 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Add more detailed testcases for Color, Pagecolor and Definecolor

https://gerrit.wikimedia.org/r/892486

Change 892420 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for tex-statement "a {b \\color{red} c} d"

https://gerrit.wikimedia.org/r/892420

Change 892424 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for tex-statement "\pagecolor{red} e^{i \pi}"

https://gerrit.wikimedia.org/r/892424

Change 892425 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for tex-statement "\definecolor{ultramarine}{RGB}{0,32,96} a {b \color{ultramarine} c} d"

https://gerrit.wikimedia.org/r/892425

Change 893446 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for state forwarding

https://gerrit.wikimedia.org/r/893446

Change 893446 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix for state forwarding

https://gerrit.wikimedia.org/r/893446

Change 892486 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Add more detailed testcases for Color, Pagecolor and Definecolor

https://gerrit.wikimedia.org/r/892486

Change 892420 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix for tex-statement "a {b \\color{red} c} d"

https://gerrit.wikimedia.org/r/892420

Change 892424 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix for tex-statement "\pagecolor{red} e^{i \pi}"

https://gerrit.wikimedia.org/r/892424

Change 892425 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix for tex-statement definecolor

https://gerrit.wikimedia.org/r/892425

Change 893766 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for limits

https://gerrit.wikimedia.org/r/893766

Change 897953 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix for limits

https://gerrit.wikimedia.org/r/897953

Change 897953 abandoned by Stegmujo:

[mediawiki/extensions/Math@master] Fix for limits

Reason:

https://gerrit.wikimedia.org/r/897953

Change 893766 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix for limits

https://gerrit.wikimedia.org/r/893766

Change 900132 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix succeeding comma

https://gerrit.wikimedia.org/r/900132

Change 900136 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix lack of spaces in some cases

https://gerrit.wikimedia.org/r/900136

Change 900137 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix spaces defined by tilde and ...

https://gerrit.wikimedia.org/r/900137

Change 901280 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix P rendered as pilcrow

https://gerrit.wikimedia.org/r/901280

Change 901285 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix double parenthesis "{}" not rendered as sideset

https://gerrit.wikimedia.org/r/901285

Change 900132 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix spaces and commas differentiation

https://gerrit.wikimedia.org/r/900132

Change 900136 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix spaces not rendering if within mtext

https://gerrit.wikimedia.org/r/900136

Change 901763 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix Colors not displaying correctly

https://gerrit.wikimedia.org/r/901763

Change 902042 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Add additional count in preceding subscript conditions

https://gerrit.wikimedia.org/r/902042

Change 901280 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix P rendered as pilcrow

https://gerrit.wikimedia.org/r/901280

Change 900137 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix spaces defined by tilde and backslash

https://gerrit.wikimedia.org/r/900137

Change 902042 abandoned by Physikerwelt:

[mediawiki/extensions/Math@master] Add additional check for arraylength

Reason:

merged into previous change

https://gerrit.wikimedia.org/r/902042

Change 901763 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix colors

https://gerrit.wikimedia.org/r/901763

Change 901285 merged by jenkins-bot:

[mediawiki/extensions/Math@master] Fix preceding subscript

https://gerrit.wikimedia.org/r/901285