The TeX/MathML Map File Specification |
The structure of the map file
- Namespaces:
- local: xmlns:pat = "http://www.orcca.on.ca/mathml/tex2mml.xml"
- general: xmlns = "http://www.w3.org/1998/Math/MathML"
- Root element: pat:tex2mmlmap
- Allowed children of pat:tex2mmlmap:
- Allowed children of pat:template:
- Allowed children of pat:tex:
- Allowed children of pat:mml:
- MathML elements
- pat:rep
- pat:variable
- Allowed children of pat:img:
- Table of attributes used with the above elements:
Element name
|
Attribute(s) |
Purpose |
pat:tex2mmlmap |
version |
Map file version |
pat:template |
- |
Contains the tuple defining the mapping |
pat:tex |
op |
Matching TeX macro/symbol name |
|
params (optional) |
TeX macro parameters (if any) |
|
prec (optional) |
Template's precedence (TeX to MathML only) |
pat:mml |
op |
Matching MathML main operation |
pat:variable |
name |
Identifies a variable by its name |
|
attribute (optional) |
Means that the result should be added as an attribute to its parent element |
|
map (optional) |
Specifies how the attribute values should be mapped |
pat:rep |
- |
Contains the pattern to be repeated |
Some concrete examples and explanations can be found here.
Templates
In essence, the map file is a collection of templates. The
purpose of each template is to specify how a particular TeX or
MathML fragment can be mapped into other formats. Therefore,
each template is a tuple (or, more precisely, it is a triple
at the moment) specifying how pieces of TeX and MathML (and
possibly an image) correspond to each other.
When mapping from TeX to MathML, a template is chosen based on the op
attribute of the pat:tex element of a
template. The value of the attribute op can be either a TeX symbol,
such as + or *, any other token, such as \\, or a macro name, e.g. \cr, \root or
\matrix. It is possible to have more than
one template for a particular op. In that
case the prec attribute is
used. Precedence is specified by an integer. The rule is that for a
particular TeX operator/macro a template with the highest precedence
is chosen first. If the precedence for a template is not specified
explicitely, then it is assumed to be zero by default. When several
templates have the same precedence they are considered in the order in which they appear in the map file.
It is important to bear in mind infix operators (such as _, ^, etc.)
have the op attribute set to a special value of \PSEUDO, and
normally templates for infix operators have higher precedence than other templates.
Therefore, one should keep this in mind when choosing precedences for templates.
Special conventions for the syntax used inside the parameters attribute of the pat:tex element
An expression found in the parameters attribute of the pat:tex element is treated as normal TeX, except for the following special "macros" which are recognized and preprocessed by the Tex2Mml application:
-
\patVAR[!|*|+|empty]{varName}
Several types of variables are recognized:
- Variables matching exactly one token - \patVAR!{varName}
- Variables matching zero or more tokens - \patVAR*{varName}
- Variables matching one or more tokens - \patVAR+{varName}
- Variables matching zero or one tokens - \patVAR{varName}
Names of variables have to be at least one character long, and start
with a letter, followed by one or more letters, digits or
underscores. They are also allowed to be prefixed by an optional tilde
prefix. The regular expression to describe a variable name is:
varName := (~ | empty) letter (letter | digit | _)*
In addition to the information about how many tokens a variable
matches, it also has a type. A variable can be either scalar or
non-scalar. A variable is defined to be non-scalar if it occurs within
a pattern; otherwise it is scalar. A repetition pattern in the "extended" TeX syntax can be created by using the following "macro":
-
\patREP[*|+]{pattern}
Two types or patterns are recognized:
- A pattern which occurs zero or more times - \patREP*{...}
- A pattern which occurs one or more times - \patREP+{...}
When \patVAR occurs inside a pattern, it
automatically becomes non-scalar (see the examples below). When a
variable occurs inside a \patREP it also
has to be a descendant of a pat:rep element in the pat:mml element (see the next section for the
important information about the correspondence between the macros
\patREP and
\patVAR and the elements pat:rep
and pat:variable).
<pat:tex op="\PSEUDO" params="\patVAR+{num} \over \patVAR+{den}" prec="666"/>
<pat:tex op="\matrix" params="{\patREP+{\patVAR+{a}\patREP*{&\patVAR+{b}}\cr}"/>
<pat:tex op="\gcd" params="(\patVAR+{argA}\patREP*{,\patVAR+{argI}})"/>
<pat:tex op="\gcd"/>
The following can be observed in the above examples:
Recursive \patREP's are allowed
(arbitrarily deeply nested). Note that certain characters are not allowed as CDATA
in XML, and must be escaped (by using XML entities,
e.g. &, must be marked up as "&").
Special conventions for the syntax used inside the pat:mml element
The XML markup contained within the pat:mml element is the MathML markup
corresponding to the TeX found in the op
and the params attributes of the pat:tex element. Because the document has no
DTD, in theory any valid XML can appear under pat:mml. However, there are two elements that
have special meanings (other elements are assumed to be valid
MathML). The first of these elements represents a variable, and has
name pat:var. It must have the name
attribute (variable name), and be an empty element. The name of the
variable must be one of the names appearing in the params attribute of the pat:tex element. If a variable is non-scalar
(e.g. variables a, b and argI in the
examples above), it must be a child of the other special element named
pat:rep . Just like \patREP, pat:rep
corresponds to a repetition pattern, but this time in MathML. Every
non-scalar variable must have pat:rep as
its predecessor. Conversely, every pat:rep
must have at least one non-scalar variable as its descendant (in the case
of nested pat:rep's it does not iclude
variables occurring inside the nested pat:rep's; therefore,
in the \matrix example below variable a and the
outer \patREP+ form a pair, as do variable b and
the inner \patREP*, but the variable b and the
outer \patREP+) do not. Scalar variables are allowed to
appear anywhere (as long as they are descendants of pat:mml).
Examples of complete templates
The following template will match the opening parenthesis in TeX, and map it into the <mo>(</mo> element. By default, this template's precedence is 0:
<pat:template>
<pat:tex op="("/>
<pat:mml op="(">
<mo> ( </mo>
</pat:mml>
</pat:template>
The second template will transform the TeX macro \alpha into the corresponding UNICODE character:
<pat:template>
<pat:tex op="\alpha"/>
<pat:mml op="α">
<mo> α </mo>
</pat:mml>
</pat:template>
The next template is an example of an infix TeX macro. Because it
should be processed before any other macro/symbol, its precedence is
set higher than most other templates in the map file. This template also features two scalar variables num and den:
<pat:template>
<pat:tex op="\PSEUDO" params="\patVAR+{num}\over\patVAR+{den}" prec="666"/>
<pat:mml op="mfrac">
<mfrac>
<pat:variable name="num"/>
<pat:variable name="den"/>
</mfrac>
</pat:mml>
</pat:template>
Here is a more complicated example involving non-scalar variables firstCol and rest;
it shows how a matrix can be transformed:
<pat:template>
<pat:tex op="\matrix"
params="{\patREP+{\patVAR+{firstCol}\patREP*{&\patVAR+{rest}}\cr}}"/>
<pat:mml op="mtable">
<mtable>
<pat:rep>
<mtr>
<mtd> <pat:variable name="firstCol"/> </mtd>
<pat:rep>
<mtd> <pat:variable name="rest"/> </mtd>
</pat:rep>
</mtr>
</pat:rep>
</mtable>
</pat:mml>
</pat:template>
In the following example you can see how not only tags, but also attributes can be generated. Note that
if pat:variable has the attribute attribute it will be placed as an attribute
on its parent element (in this case mfenced), and only the fact that pat:variable
is a child of mfenced matters, but not the former's position among other children.
<pat:template>
<pat:tex op="\left" params="\patVAR!{lDelim} \patVAR*{expr} \right\patVAR!{rDelim}"/>
<pat:mml op="">
<mfenced separators="">
<pat:variable name="lDelim" attribute="open"/>
<pat:variable name="expr"/>
<pat:variable name="rDelim" attribute="close"/>
</mfenced>
</pat:mml>
</pat:template>
Finally, here is an example showing how attribute values can also be mapped. This often comes in
handy when one needs to map TeX environments, such as array, tabular, etc. While
TeX usually uses one-letter specifiers for justification (e.g. l means "left-justify")
and alignment values, MathML is oftentimes more verbose, and hence a correspondence between them
has to be specified:
<pat:template>
<pat:tex op="\begin" params="{array} {\patREP*{\patVAR!{hjust}}} \patREP*{\patVAR*{firstCol}\patREP*{&\patVAR*{rest}}\\} \end{array}"/>
<pat:mml op="">
<mtable>
<pat:rep>
<pat:variable name="hjust" attribute="columnalign" map="l=left c=center r=right"/>
</pat:rep>
<pat:rep>
<mtr>
<mtd> <pat:variable name="firstCol"/> </mtd>
<pat:rep>
<mtd> <pat:variable name="rest"/> </mtd>
</pat:rep>
</mtr>
</pat:rep>
</mtable>
</pat:mml>
</pat:template>