Fix the TexVC(PHP) Parse tree related cases
Fix the TexVC Parse tree since it is not consistent with MathML Parsetree, texVC grammar has to be adapted.

The parse tree in TexVC(PHP) is currently in many cases not enough to create valid MathML, example cases can be found in this test-file in texvctreebugs, the ultimate solution is to refactor the grammar file so that the parse tree by TexVC(PHP) is correct for generating MathML.

I uploaded a html-file here which has the MathML of the erroneous cases. Looking at the MathML for sideset is a start to understand the types of errors:

Physikerwelt triaged this task as Medium priority.

We need to look into that case by case and decide whether we want to update texvcjs. In the first step, I will review the JSON file, and comment what I think is correct or incorrect.

Change 889094 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix Grammar for Parsetree for ...

Change 889281 had a related patch set uploaded (by Stegmujo; author: Stegmujo):

[mediawiki/extensions/Math@master] Fix Grammar for case ...

TC-Index is always related to index in "MMLGenerationTexUtilTestLocal.php"

Case by case check 1: " color-cases"

TC-Index: 90
Tex: "\color{red}{red}" edit: this format is not directly valid. Appending characters are also red (in all interpretation


  • first solution for usage is to add testcases here "a {b \color{red} c} d" here (new solutions then to be defined)

Similar cases (with issue):

  • "\pagecolor{red}{red}"
  • "\definecolor{ultramarine}{RGB}{0,32,96}"

Similar cases (already solved issue):

  • "\cfrac{a}{b}"


<mrow class="MJX-TeXAtom-ORD">
  <mstyle displaystyle="true" scriptlevel="0">
    <mstyle mathcolor="red">
      <mrow class="MJX-TeXAtom-ORD">


  • TexArray
    • Literal(arg="\color{red}")
    • Curly - TexArray(Literal("r"),Literal("e"),Literal("d")


  • In ParseTree TexVC there are two non-related Elements with relevant information
  • the preceding Literal(arg="red") element is not related to the subsequent Curly element (curly could alsocontain "green" or something else which should then be highlighted red")

Solution (draft):

  • ideally the parse tree is nested with Color(red)[Curly(the text)]

@Physikerwelt what do you think of this case and the solution draft ?

Case by case check 2: "non-squashed literals case"

TC-Index: 39 not completely currently its mathfrak{a}
Tex: "\mathfrak{abcde}"
Similar cases (with issue):

  • \mathit{a} and all follow up cases 39++
  • it seems to be all "is_lettermod" cases
  • "\alpha\,\!" from FullCoverageTest and similar cases should be considered (multiple commands in one statement)
  • ""\\exp_a b = a^b, \\exp b = e^b, 10^m \\!";"

Similar cases (already solved issue):

MML-Mathoid(Reference): (tbd just a guess)

<mrow data-mjx-texclass="ORD">
   <mrow data-mjx-texclass="ORD">
     <mi mathvariant="fraktur">abcde</mi>


<mrow data-mjx-texclass="ORD">
  <mrow data-mjx-texclass="ORD">
    <mi mathvariant="fraktur">a</mi>
    <mi mathvariant="fraktur">b</mi>
    <mi mathvariant="fraktur">c</mi>
    <mi mathvariant="fraktur">d</mi>
    <mi mathvariant="fraktur">e</mi>


  • TexArray-Fun1-Curly-Texarray(Literal("a), Literal("b"), Literal("c) ... )


  • When parsing text annotating nested elements, all literals within these elements are processed char-by-char

Solution (draft):

  • the literals get squashed in TexVC grammar to MML element


  • Currently the rendered version looks ok of mathfrak{abcde}, but mathml or speech could be problematic from such representation

Fix Draft:

Case by case check 3: "sideset"

TC-Index: 90
Tex: "\sideset{_1^2}{_3^4}\sum"
Similar cases (with issue):

  • succeeding elements: sum is a "nullary_macro", so all nullary macros ??
    • \sideset{_1^2}{_3^4}\supset in example is rendered by TEMML online converter
  • Preceding elements: besides sideset ?

Similar cases (already solved issue):


<mrow class="MJX-TeXAtom-OP">
    <mrow class="MJX-TeXAtom-OP MJX-fixedlimits">
      <mrow class="MJX-TeXAtom-ORD">
        <mpadded width="0">
          <mrow class="MJX-TeXAtom-ORD">
    <mrow class="MJX-TeXAtom-ORD">
    <mrow class="MJX-TeXAtom-ORD">
  <mspace width="negativethinmathspace"/>
    <mrow class="MJX-TeXAtom-OP MJX-fixedlimits">
      <mo movablelimits="false">&#x2211;</mo>
    <mrow class="MJX-TeXAtom-ORD">
    <mrow class="MJX-TeXAtom-ORD">


 <mrow data-mjx-texclass="OP">
  <mmultiscripts data-mjx-script-align="left">
    <mrow data-mjx-texclass="ORD">
    <mrow data-mjx-texclass="ORD">
    <mrow data-mjx-texclass="ORD">
    <mrow data-mjx-texclass="ORD">
<mo data-mjx-texclass="OP">&#x2211;</mo>


  • TexArray - ( Fun2nb ("with all other stuff"), Literal("Sum"))


  • The \sum element at the end of tex is an unrelated operator in the parsetree in TexVCPHP to the preceding sideset element

Solution (draft):

  • One: Sum and similar elements are nested in the elements when sideset (or similar tbd-elements) are preceding
  • Two: (non-grammar-change) In TexArray is checked wheter it has 'compound' elements, parameters are passed as configuration or TexArray is re-arranged , state is passed so elements are parsed correctly -> take this one

Case by case check 4: "limits-case"

TC-Index: 411
Tex: "\lim\limits_{x \to 2}"
Similar cases (with issue):

  • i462: nolimits: \mathop{\rm cos}\nolimits^2 ???

Similar cases (already solved issue):

  • limits_{x \to 2} renders correctly


 <mrow class="MJX-TeXAtom-ORD">
  <mstyle displaystyle="true" scriptlevel="0">
      <mo form="prefix">lim</mo>
      <mrow class="MJX-TeXAtom-ORD">
        <mo stretchy="false">&#x2192;</mo>


  • Texarray
    • Literal ("Lim")
    • DQ
      • base:Literal("limits")
      • down: Curly (Literal("x"),Literal("\to"),Literal("2")

For "\sum\limits_{j=1}^k" the tree is similar, but FQ instead of DQ.


  • lim_ not recognized correctly
  • compound [\lim | \sum | \prod ] \limits elements are not compound elements of the parsetree, this creates problem generating MML output

Solution (draft):

Case differentiation:

  • lim_{x \to 2} add testcase which renders correctly
  • \lim\limits
  • \sum\limits
  • \prod\limits
  1. Solution 1: Simple solution could be to just to implicitly assume "lim" element when DQ("base") contains "limits" (does limits always imply lim ?). And forward a state from TexArray that Literal("lim") is not processed
  2. Solution 2: Make a node element and grammar condition which specifially recognizes limits based constructs (practically this could also be something like "Fun3"
  3. Solution 3: Texarray based check and state forwarding (similar to case 3 sideset) -> take this one

Additional Info:

Re: Case by case check 1: " color-cases"


There are two issues:

  1. How to define a color?
  2. How to use a color?

1 is hard 2 is medium. Case 1 is about 2, correct?

Moreover, I don't see how cfrac is related. Can you elaborate.


This needs more elaboration. Do you want to move the \color command to another class in texutil?

Currently color is defined as

		"color_function": true,
		"color_required": true,
		"mhchem_macro_2pc": true

Which brings me to the question. How does the parsetree look for \ce{\color{red}{x}} in chem mode? Maybe adding "fun_ar2": true to the json file would do the job?

Re: Case 2

Currently MathJax generates

<math xmlns="" display="block" alttext="{\mathfrak {abcde}}">
    <mrow class="MJX-TeXAtom-ORD">
      <mrow class="MJX-TeXAtom-ORD">
        <mi mathvariant="fraktur">a</mi>
        <mi mathvariant="fraktur">b</mi>
        <mi mathvariant="fraktur">c</mi>
        <mi mathvariant="fraktur">d</mi>
        <mi mathvariant="fraktur">e</mi>
    <annotation encoding="application/x-tex">{\mathfrak {abcde}}</annotation>

I would therefore exclude that from the initial release and keep it for later.

Re Case 3

LaTeXML generates:

<math xmlns="" id="p1.1.m1.1" class="ltx_Math" alttext="\sideset{{}_{1}^{2}}{{}_{3}^{4}}{\sum}" display="inline"><semantics id="p1.1.m1.1a"><mmultiscripts id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml"><mo id="p1.1.m1." xref="p1.1.m1.">∑</mo><mn id="p1.1.m1." xref="p1.1.m1.">3</mn><mn id="p1.1.m1.1.1.3" xref="p1.1.m1.1.1.3.cmml">4</mn><mprescripts id="p1.1.m1.1.1a" xref="p1.1.m1.1.1.cmml"/><mn id="p1.1.m1." xref="p1.1.m1.">1</mn><mn id="p1.1.m1." xref="p1.1.m1.">2</mn></mmultiscripts><annotation-xml encoding="MathML-Content" id="p1.1.m1.1b"><apply id="p1.1.m1.1.1.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1.1.1.1.cmml" xref="p1.1.m1.1.1">superscript</csymbol><apply id="p1.1.m1.1.1.2.cmml" xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1." xref="p1.1.m1.1.1">subscript</csymbol><apply id="p1.1.m1." xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1." xref="p1.1.m1.1.1">subscript</csymbol><apply id="p1.1.m1." xref="p1.1.m1.1.1"><csymbol cd="ambiguous" id="p1.1.m1." xref="p1.1.m1.1.1">superscript</csymbol><sum id="p1.1.m1." xref="p1.1.m1."/><cn type="integer" id="p1.1.m1." xref="p1.1.m1.">2</cn></apply><cn type="integer" id="p1.1.m1." xref="p1.1.m1.">1</cn></apply><cn type="integer" id="p1.1.m1." xref="p1.1.m1.">3</cn></apply><cn type="integer" id="p1.1.m1.1.1.3.cmml" xref="p1.1.m1.1.1.3">4</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="p1.1.m1.1c">\sideset{{}_{1}^{2}}{{}_{3}^{4}}{\sum}</annotation><annotation encoding="application/x-llamapun" id="p1.1.m1.1d">SUPERSCRIPTOP SUBSCRIPTOP SUBSCRIPTOP SUPERSCRIPTOP start_ARG ∑ end_ARG 2 1 3 4</annotation></semantics></math>

I guess we should aim for mmultiscripts here as well?

The syntax for mmultiscripts is mmultiscripts base 3 4 mprescripts 1 2, cf

I am not exactly sure how to change the grammar. I would recommend inverstigation of all commands in the fun2nb class (maybe this are not that many)? Do we want a dedicated ticket for this?

Re: Case by case check 1: " color-cases"

1 is hard 2 is medium. Case 1 is about 2, correct?

i think the solution (when modifying grammar) is also about 2 because the grammar parsing definecolor statement is around the same "\definecolor{ultramarine}{RGB}{0,32,96}"

Moreover, I don't see how cfrac is related. Can you elaborate.

It is a hint for the correct format of parsetree (related by structure maybe).

Which brings me to the question. How does the parsetree look for \ce{\color{red}{x}} in chem mode?

Parsetree looks like

  • Mhchem(fname="\ce")
    • Curly
      • left: Fun2 -> Literal("red"), Curly(TexArray(Chemword(left: "x", right: ""))),
      • right: Literal("")

Maybe adding "fun_ar2": true to the json file would do the job?

quick check shows it does not, parse-tree look the same as before

  1. Re: Case 2

I would therefore exclude that from the initial release and keep it for later.

Agree with this, squashing seems complex in the grammar and also can be error-prone because of varying delimiters for each squashed elements for literals.
However, solution draft can somewhere be saved here and then re-used or grammar adapted:

Re Case 4

I think limits is quite special and I would therefore recommend a specific handling of the parse tree in a postprocessing step.

In particular


Literal ("Lim")
    down: Curly (Literal("x"),Literal("\to"),Literal("2")

should become


Literal ("Lim")
(D/U/F)Q (option limits)
    base:NEXT TOKEN
    down: Curly (Literal("x"),Literal("\to"),Literal("2")

However, solution draft can somewhere be saved here and then re-used or grammar adapted:

Did you test if tests (enwikiformulae) fail? I would expect that there is quite a significant amount of cases...

However, solution draft can somewhere be saved here and then re-used or grammar adapted:

Did you test if tests (enwikiformulae) fail? I would expect that there is quite a significant amount of cases...

i checked with full-coverage test here to have an initial overview, from the look they seem ok after some optimizations, but its still a lot of cases togo.

For Case 1

I suggest not to change the grammar and use options in a postprocessing step (like suggested for step 4)

This means


  Curly - TexArray(Literal("r"),Literal("e"),Literal("d")


TexArray (color red)

  Curly - TexArray(Literal("r"),Literal("e"),Literal("d")

After investigating all the cases, I would not recommend changing the grammar at all. I think solutions within the MathML rendering phase are preferable. I suggest to close this issue and discuss the individual cases in subsequent tickets.

Stegmujo renamed this task from Fix the TexVC(PHP) Parse tree generation to Fix the TexVC(PHP) Parse tree related cases.Feb 24 2023, 4:05 PM
Stegmujo reopened this task as Open.

sideset can be realized with multiscript, not use mathoid reference

