MathML to TeX Conversion:
Conserving highlevel semantics
Contents:
 Main goals of the converter
 Principal MathML to TEX conversion scheme
 Three jobs for the converter
 Three interfaces to the converter
 Advanced possibilities
 Advantages of this converter
 Choice of Java  program vs XSLT
 Status
1. Main goal of this converter
 This program converts a given MathML representation of some formula to an equivalent TeX expression.
 The MathML may be a part of an XMLfile or may be given as an input string.
 One of the main goals of this converter is to conserve highlevel semantics during translation.
 Using a mapping file allows us to transform highlevel MathML extensions to TeX macros.
to contents
2. Principal conversion scheme
Figure 1. MathML to TeX converter scheme
to contents
2.1 MathML Object Structure
A MathMLobject is represented in the form of Document Object Model as specified by the W3C
Recommendation http://www.w3.org/DOM .
<math>
<mo> ∑ </mo>
<msupsub>
<mi> x </mi>
<mn> 3 </mn>
<mi> i </mi>
</msupsub>
</math>



to contents
2.2 TeX Object Structure
A TeX  object is also represented by a tree, but the logic of its structure is different from
that of a MathML DOMtree. Each level of this tree corresponds to a TeX group.
Example: the following TeXexpression $\sqrt {1\alpha} + x^{3+a}$
is represented as
Figure 2. Structure of TeXObject
to contents
2.3 Mapping File
 The mapping file is one of principle parts of converter: it describes the correspondence between MathML and TEX patterns.
 The mapping file has a XMLform and consists of templates, representing MathML  TEX patterns.
 Each template has a form
<pat:template>
<pat:tex op="\[TeX macro]" parameters=" TEX expression"/>
<pat:mml op="mmlelement" mode="mathtextspec"
. . .
[MathML expression]
. . .
<pat:mml>
</pat:template>

to contents

The structure of mapping file
 Namespaces:
 local:
xmlns:pat = "http://www.orcca.on.ca/mathml/tex2mml.xml"
 general:
xmlns:pat = http://www.w3.org/1998/Math/MathML
 Root element: pat:tex2mmlmap
 Allowed children of pat:tex2mmlmap:
 Allowed children of
pat:template
:
 Allowed children of
pat:tex
: none
 Allowed children of
pat:mml
:
 MathML elements

pat:rep

pat:variable
 Allowed children of
pat:img
: none
 Table of attributes used with the above elements:
Element name
 Attribute(s)
 Purpose

pat:tex2mmlmap
 version
 Map file version

pat:tex2mmlmap
 version
 Map file version

pat:tex2mmlmap
 version
 Map file version

pat:template
 
 

pat:tex
 op
 Matching TeX macro/symbol name


params (optional)
 TeX macro parameters (if any)


prec (optional)
 Template's precedence (Tex to MathML)

pat:mml
 op
 Matching MathML main operation

 mode (optional)
 = 'math'  'text'  'spec'

pat:variable
 name
 Identifies a variable by its name

pat:rep
 
 Declares the repetition pattern

More information about our mapping file and its specification can be found on
the ORCCA site.
to contents

Examples of mapping templates
 for fraction:
<pat:template>
<pat:tex op="\frac" params="\patVAR!{num}\patVAR!{den}"/>
<pat:mml op="mfrac">
<mfrac>
<pat:variable name="num"/>
<pat:variable name="den"/>
</mfrac>
</pat:mml>
</pat:template>
 for fenced expression
<pat:template>
<pat:tex op="" params="\left\patVAR!{o} \patREP*{\patVAR*{b},}\patVAR{t} \right\patVAR!{c}"/>
<pat:mml op="mfenced">
<mfenced open="pat:variable=o" close="pat:variable=c">
<pat:rep> <pat:variable name="b"/> </pat:rep>
<pat:variable name="t"/>
</mfenced>
</pat:mml>
</pat:template>
to contents

Using a mapping file allows to transform highlevel MathML extensions to TeX macros.
Suppose user has defined 2 style sheets for XSLT and for TeX:
combinatorics.xsl
XSLT template for an element <mmlx:binom>:
<xsl:template match = "apply/mmlx:binom[position()=1][count(child::*)=2]">
<mfrac thikness="0ex">
<xsl:foreach select = 'mmlx:binom/child::*'>
<xsl:copyof select='.'/>
</xsl:foreach>
</mfrac>
</xsl:template>
combinatorics.cls
\newcommand{\binom}[2]{left(\atop{#1}{#2}\right)}
Now we can put a template for convert <mmlx:binom> to \binom in the mapping file:
<pat:template>
<pat:tex op="\binom"params="\patVAR!{a}\patVAR!{b}"/>
<pat:mml op="applymmlx:binomial">
<apply>
<mmlx:binomial>
<pat:variable name="a"/>
<pat:variable name="b"/>
</mmlx:binomial>
</apply>
</pat:mml>
</pat:template>
Then we would want to translate
<apply>
<mmlx:binomial>
<apply>
<plus/>
<ci> a </ci>
<ci> b </ci>
</apply>
<mrow>
<mi> c </mi>
<mo> + </mo>
<mi> d </mi>
</mrow>
</mmlx:binomial>
</apply>
The standard way using XSLT will give us explicit expression for this notation:
\left(\atop{a+b}{c+d}\right)},
but technique of using mapping file allows us get TeX macro defined in combinatorics.cls:
\binom{a+b}{c+d},
in this case we preserve the semantic, defined by user.
to contents
3 Three jobs for the converter
 File to file
To convert an entire
 MathML file into TEX document
 XML file with MathML entries to XML document with embedded TEX
 Expression to expression
The system provides the possibility to convert any valid MathML expression, given as input string.
 Object to object
This possibility allows to user manipulate individual MathML and TEX objects obtained from sources different from standard MathML or TEX files.
to contents
4 Three interfaces to the converter
The converter is available as
 Commandline mode
 GUI Framework
 Servlet at ORCCA web site.
to contents
5. Advanced possibilities
5.1 Linebreaking in long formulas
Converter provides a special algorithm for line breaking in TeX output.
Motivation: Standard MathML browsers (Amaya, Mozilla, MathPlayer) perform
line breaking in mathematical formulas according to their own logic,
but TeX does not. So long formulas, generated from MathML may not fit the page
of TeX document.
The algorithm provides line breaking in long expression such as
 Plain formulas (like polynomials or simple summation),
 long text,
 long numbers,
 special cases like
 radicals,
 fractions,
 matrix entries,
 indexes,
 under and over scripts
Examples of linebreaking in special cases:
1. Long expression under a radical

 2. Long expression in a numerator




3. Long superscripts

 4. Long underscripts




5. Long number
100! = 93326215443944152681699238856266700490715968264381621468592963895
21759999322991560894146397615651828625369792082722375825118521091
6864000000000000000000000000

to contents
5.2 Settings for output TeX
 The converter allows the user to choose a text area width and font size for the output TEX file.
These settings will be considered in the calculation of the optimal formula length.
 Line breaking usually occurs around mathematical operators, but is done differently in different cultures.
There are three ways in which signs of these operators can be
 before the line break,
 after the line break,
 duplicated.
Examples:
1

 2

 3

a + b + ... + k + m + n + ... + z

 a + b + ... + k + m + n + ... + z

 a + b + ... + k + + m + n + ... + z

The user can choose the more convenient variant.
 The user may also set an option to indent after a line break.
to contents
5.3 Conversion XML with MathML to XML with TeX
The converter also provides the transformation of a XML file with MathML to XML file with TeX, embedded into CDATA section.
This CDATA will be put under XML node <LaTeX> with special namespace.
Motivation: this possibility may be useful for HTML to TeX translation.
to contents
6. Advantages of this converter
 The mapping file allows highlevel translation of userdefined macros, to help preserve any semantic content of the original expression.
 Using a mapping file for conversion from MathML to TeX and back allows the converterprogram to be flexible (i.e. not hardcoded).
 If the user needs to add a new TeXmacro or MathML element, he or she may put a new template into this mapping file.
 The algorithm for line breaking for TeX output lets one deal with mathematical expressions that cannot fit in one line in an output document.
 The userfriendly GUI lets users browse a file system to pick up an input MathML file, convert it to a DOM tree and easily explore it.
to contents
7. Choice of Java  program vs XSLT
Java code designed for this converter allows
 to share technologies with TeX to MathML converter,
 in particular mapping file can be used for conversion in both directions
 to use GUI
 to manipulate with separate TeX and MathML objects
 to compute linebreaking + other complex calculations are more natural
to contents
8. Status
Current version of converter provides possibilities to deal with
 Formulas
 Matrices
 Multiline equations
 Equation arrays
 Commutative diagram