Elena Smirnova and Stephen M. Watt

Ontario Research Centre for Computer Algebra,
University of Western Ontario

MathML to TeX Conversion:
Conserving high-level semantics

** Demo description **


Contents:

  1. Main goals of the converter
  2. Principal MathML to TEX conversion scheme
  3. Three jobs for the converter
  4. Three interfaces to the converter
  5. Advanced possibilities
  6. Advantages of this converter
  7. Choice of Java - program vs XSLT
  8. Status

1. Main goal of this converter

to contents

2. Principal conversion scheme


Figure 1. MathML to TeX converter scheme

to contents

2.1 MathML Object Structure

A MathML-object is represented in the form of Document Object Model as specified by the W3C Recommendation http://www.w3.org/DOM .

<math>
  <mo> &sum; </mo>
  <msupsub>
     <mi> x </mi>
     <mn> 3 </mn>
     <mi> i </mi>
  </msupsub>
</math>
   
to contents

2.2 TeX Object Structure

A TeX - object is also represented by a tree, but the logic of its structure is different from that of a MathML DOM-tree. Each level of this tree corresponds to a TeX -group.

Example: the following TeX-expression $\sqrt {1-\alpha} + x^{3+a}$ is represented as


Figure 2. Structure of TeX-Object

to contents

2.3 Mapping File

<pat:template>
  <pat:tex op="\[TeX macro]" parameters=" TEX expression"/>
  <pat:mml op="mml-element" mode="math|text|spec"
   . . .
   [MathML expression]
   . . .
  <pat:mml>
</pat:template>
to contents
Element name Attribute(s) Purpose
pat:tex2mmlmap version Map file version
pat:tex2mmlmap version Map file version
pat:tex2mmlmap version Map file version
pat:template - -
pat:tex op Matching TeX macro/symbol name
params (optional) TeX macro parameters (if any)
prec (optional) Template's precedence (Tex to MathML)
pat:mml op Matching MathML main operation
mode (optional) = 'math' | 'text' | 'spec'
pat:variable name Identifies a variable by its name
pat:rep - Declares the repetition pattern

More information about our mapping file and its specification can be found on the ORCCA site.

to contents
  1. for fraction:
    <pat:template>
      <pat:tex op="\frac" params="\patVAR!{num}\patVAR!{den}"/>
      <pat:mml op="mfrac">
        <mfrac>
          <pat:variable name="num"/>
          <pat:variable name="den"/>
        </mfrac>
      </pat:mml>
    </pat:template>
    

  2. for fenced expression
    <pat:template>
     <pat:tex op="" params="\left\patVAR!{o} \patREP*{\patVAR*{b},}\patVAR{t} \right\patVAR!{c}"/>
      <pat:mml op="mfenced">
        <mfenced open="pat:variable=o" close="pat:variable=c">
         <pat:rep> <pat:variable name="b"/> </pat:rep>
         <pat:variable name="t"/>
        </mfenced>
      </pat:mml>
    </pat:template>
    

to contents Suppose user has defined 2 style sheets for XSLT and for TeX:

combinatorics.xsl

XSLT template for an element <mmlx:binom>:

<xsl:template match = "apply/mmlx:binom[position()=1][count(child::*)=2]">
  <mfrac thikness="0ex">
     <xsl:for-each select = 'mmlx:binom/child::*'>
	<xsl:copy-of select='.'/>
     </xsl:for-each>
  </mfrac>
</xsl:template>


combinatorics.cls

\newcommand{\binom}[2]{left(\atop{#1}{#2}\right)}


Now we can put a template for convert <mmlx:binom> to \binom in the mapping file:

<pat:template>
   <pat:tex op="\binom"params="\patVAR!{a}\patVAR!{b}"/>
   <pat:mml op="apply-mmlx:binomial">
      <apply>
	 <mmlx:binomial>
	    <pat:variable name="a"/>
	    <pat:variable name="b"/>
	 </mmlx:binomial>
      </apply>
   </pat:mml>
</pat:template>


Then we would want to translate

<apply>
 <mmlx:binomial>
   <apply>
     <plus/>
     <ci> a </ci>
     <ci> b </ci>
   </apply>
   <mrow>
     <mi> c </mi>
     <mo> + </mo>
     <mi> d </mi>
   </mrow>
 </mmlx:binomial>
</apply>

The standard way using XSLT will give us explicit expression for this notation: \left(\atop{a+b}{c+d}\right)},
but technique of using mapping file allows us get TeX macro defined in combinatorics.cls: \binom{a+b}{c+d}, in this case we preserve the semantic, defined by user.

to contents

3 Three jobs for the converter

  1. File to file
    To convert an entire
  2. Expression to expression
    The system provides the possibility to convert any valid MathML expression, given as input string.

  3. Object to object
    This possibility allows to user manipulate individual MathML and TEX -objects obtained from sources different from standard MathML or TEX files.
to contents

4 Three interfaces to the converter

The converter is available as
  1. Command-line mode

  2. GUI Framework

  3. Servlet at ORCCA web site.
to contents

5. Advanced possibilities

5.1 Linebreaking in long formulas

Converter provides a special algorithm for line breaking in TeX output.

Motivation: Standard MathML browsers (Amaya, Mozilla, MathPlayer) perform line breaking in mathematical formulas according to their own logic, but TeX does not. So long formulas, generated from MathML may not fit the page of TeX document.

The algorithm provides line breaking in long expression such as

Examples of linebreaking in special cases:

1. Long expression under a radical 2. Long expression in a numerator
long subroot expression long numerator
3. Long superscripts 4. Long underscripts
long superscripts long  underscripts

5. Long number

100! = 93326215443944152681699238856266700490715968264381621468592963895
       21759999322991560894146397615651828625369792082722375825118521091
       6864000000000000000000000000
  
to contents

5.2 Settings for output TeX

to contents

5.3 Conversion XML with MathML to XML with TeX

The converter also provides the transformation of a XML file with MathML to XML file with TeX, embedded into CDATA section. This CDATA will be put under XML node <LaTeX> with special namespace.
Motivation: this possibility may be useful for HTML to TeX translation.

to contents

6. Advantages of this converter

to contents

7. Choice of Java - program vs XSLT

Java -code designed for this converter allows to contents

8. Status

Current version of converter provides possibilities to deal with