# MathML to TeX Conversion: Conserving high-level semantics

## 1. Main goal of this converter

• This program converts a given MathML representation of some formula to an equivalent TeX expression.
• The MathML may be a part of an XML-file or may be given as an input string.
• One of the main goals of this converter is to conserve high-level semantics during translation.
• Using a mapping file allows us to transform high-level MathML extensions to TeX macros.
to contents

## 2. Principal conversion scheme

Figure 1. MathML to TeX converter scheme

to contents

### 2.1 MathML Object Structure

A MathML-object is represented in the form of Document Object Model as specified by the W3C Recommendation http://www.w3.org/DOM .

  x 3 i 
to contents

### 2.2 TeX Object Structure

A TeX - object is also represented by a tree, but the logic of its structure is different from that of a MathML DOM-tree. Each level of this tree corresponds to a TeX -group.

Example: the following TeX-expression $\sqrt {1-\alpha} + x^{3+a}$ is represented as

Figure 2. Structure of TeX-Object

to contents

### 2.3 Mapping File

• The mapping file is one of principle parts of converter: it describes the correspondence between MathML and TEX patterns.
• The mapping file has a XML-form and consists of templates, representing MathML - TEX patterns.
• Each template has a form
  
to contents
• #### The structure of mapping file

• Namespaces:
• local:  xmlns:pat = "http://www.orcca.on.ca/mathml/tex2mml.xml"
• general:  xmlns:pat = http://www.w3.org/1998/Math/MathML
• Root element: pat:tex2mmlmap
• Allowed children of pat:tex2mmlmap:
•  pat:template
• Allowed children of  pat:template :
•  pat:tex
•  pat:mml
•  pat:img
• Allowed children of  pat:tex : none
• Allowed children of  pat:mml:
• MathML elements
•  pat:rep
•  pat:variable
• Allowed children of  pat:img : none
• Table of attributes used with the above elements:
 Element name Attribute(s) Purpose  pat:tex2mmlmap  version Map file version  pat:tex2mmlmap  version Map file version  pat:tex2mmlmap   version  Map file version  pat:template  - -  pat:tex   op  Matching TeX macro/symbol name  params  (optional) TeX macro parameters (if any)  prec  (optional) Template's precedence (Tex to MathML)  pat:mml   op  Matching MathML main operation  mode  (optional) = 'math' | 'text' | 'spec'  pat:variable   name  Identifies a variable by its name  pat:rep  - Declares the repetition pattern

to contents
• #### Examples of mapping templates

1. for fraction:
<pat:template>
<pat:tex op="\frac" params="\patVAR!{num}\patVAR!{den}"/>
<pat:mml op="mfrac">
<mfrac>
<pat:variable name="num"/>
<pat:variable name="den"/>
</mfrac>
</pat:mml>
</pat:template>


2. for fenced expression
<pat:template>
<pat:tex op="" params="\left\patVAR!{o} \patREP*{\patVAR*{b},}\patVAR{t} \right\patVAR!{c}"/>
<pat:mml op="mfenced">
<mfenced open="pat:variable=o" close="pat:variable=c">
<pat:rep> <pat:variable name="b"/> </pat:rep>
<pat:variable name="t"/>
</mfenced>
</pat:mml>
</pat:template>


to contents
• #### Using a mapping file allows to transform high-level MathML extensions to TeX macros.

Suppose user has defined 2 style sheets for XSLT and for TeX:

combinatorics.xsl

XSLT template for an element <mmlx:binom>:

<xsl:template match = "apply/mmlx:binom[position()=1][count(child::*)=2]">
<mfrac thikness="0ex">
<xsl:for-each select = 'mmlx:binom/child::*'>
<xsl:copy-of select='.'/>
</xsl:for-each>
</mfrac>
</xsl:template>


combinatorics.cls

\newcommand{\binom}[2]{left(\atop{#1}{#2}\right)}

Now we can put a template for convert <mmlx:binom> to \binom in the mapping file:

<pat:template>
<pat:tex op="\binom"params="\patVAR!{a}\patVAR!{b}"/>
<pat:mml op="apply-mmlx:binomial">
<apply>
<mmlx:binomial>
<pat:variable name="a"/>
<pat:variable name="b"/>
</mmlx:binomial>
</apply>
</pat:mml>
</pat:template>


Then we would want to translate

<apply>
<mmlx:binomial>
<apply>
<plus/>
<ci> a </ci>
<ci> b </ci>
</apply>
<mrow>
<mi> c </mi>
<mo> + </mo>
<mi> d </mi>
</mrow>
</mmlx:binomial>
</apply>


The standard way using XSLT will give us explicit expression for this notation: \left(\atop{a+b}{c+d}\right)},
but technique of using mapping file allows us get TeX macro defined in combinatorics.cls: \binom{a+b}{c+d}, in this case we preserve the semantic, defined by user.

to contents

## 3 Three jobs for the converter

1. File to file
To convert an entire
• MathML file into TEX document
• XML file with MathML entries to XML document with embedded TEX

2. Expression to expression
The system provides the possibility to convert any valid MathML expression, given as input string.

3. Object to object
This possibility allows to user manipulate individual MathML and TEX -objects obtained from sources different from standard MathML or TEX files.
to contents

## 4 Three interfaces to the converter

The converter is available as
1. Command-line mode

2. GUI Framework

3. Servlet at ORCCA web site.
to contents

### 5.1 Linebreaking in long formulas

Converter provides a special algorithm for line breaking in TeX output.

Motivation: Standard MathML browsers (Amaya, Mozilla, MathPlayer) perform line breaking in mathematical formulas according to their own logic, but TeX does not. So long formulas, generated from MathML may not fit the page of TeX document.

The algorithm provides line breaking in long expression such as

• Plain formulas (like polynomials or simple summation),
• long text,
• long numbers,
• special cases like
• fractions,
• matrix entries,
• indexes,
• under and over scripts

Examples of linebreaking in special cases:

 1. Long expression under a radical 2. Long expression in a numerator 3. Long superscripts 4. Long underscripts

5. Long number

 100! = 93326215443944152681699238856266700490715968264381621468592963895 21759999322991560894146397615651828625369792082722375825118521091 6864000000000000000000000000 
to contents

### 5.2 Settings for output TeX

• The converter allows the user to choose a text area width and font size for the output TEX file. These settings will be considered in the calculation of the optimal formula length.

• Line breaking usually occurs around mathematical operators, but is done differently in different cultures.
There are three ways in which signs of these operators can be
1. before the line break,
2. after the line break,
3. duplicated.

Examples:
 1 2 3 a + b + ... + k + m + n + ... + z a + b + ... + k + m + n + ... + z a + b + ... + k ++ m + n + ... + z

The user can choose the more convenient variant.

• The user may also set an option to indent after a line break.
to contents

### 5.3 Conversion XML with MathML to XML with TeX

The converter also provides the transformation of a XML file with MathML to XML file with TeX, embedded into CDATA section. This CDATA will be put under XML node <LaTeX> with special namespace.
Motivation: this possibility may be useful for HTML to TeX translation.

to contents

## 6. Advantages of this converter

• The mapping file allows high-level translation of user-defined macros, to help preserve any semantic content of the original expression.
• Using a mapping file for conversion from MathML to TeX and back allows the converter-program to be flexible (i.e. not hard-coded).
• If the user needs to add a new TeX-macro or MathML element, he or she may put a new template into this mapping file.
• The algorithm for line breaking for TeX output lets one deal with mathematical expressions that cannot fit in one line in an output document.
• The user-friendly GUI lets users browse a file system to pick up an input MathML file, convert it to a DOM tree and easily explore it.
to contents

## 7. Choice of Java - program vs XSLT

Java -code designed for this converter allows
• to share technologies with TeX to MathML converter,
• in particular mapping file can be used for conversion in both directions
• to use GUI
• to manipulate with separate TeX and MathML objects
• to compute linebreaking + other complex calculations are more natural
to contents

## 8. Status

Current version of converter provides possibilities to deal with
• Formulas
• Matrices
• Multi-line equations
• Equation arrays
• Commutative diagram