groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Second version of MathML patch


From: Eric S. Raymond
Subject: [Groff] Second version of MathML patch
Date: Thu, 1 Feb 2007 15:02:16 -0500
User-agent: Mutt/1.4.2.2i

The enclosed patch adds support for MathML output to eqn.  

This is version 2 of the patch, fixing some minor issues with respect
to inline equations, adding more tests, and removing the namespace
attribute of the generated MathML.  (The latter is not needed when
embedding in an XML document with appropriate entity declarations in
the header; in fact, it interferes with namespacing.) This version 
also adds more tests.

A patch for the eqn manual page is included that explains MathML support
and its limitations.  I had planned to modify groff/doc/groff.texinfo as
well, until I discovered that the eqn section of that document 
consists entirely of hard vacuum.

SCOPE

This patch modifies the following files:

* src/preproc/eqn/*.cpp
* src/preproc/eqn/eqn.man
* groff/NEWS

It adds one new file, the test script src/preproc/eqn/mathmltest.py.

TESTING THE PATCH

The second enclosure is a Python script mathmltest.py that uses eqn to
generate a test page demonstrating the MathML translation support;
capture the output in a file with extension .xhtml and view it with
Firefox or some other MathML-capable browser. (You must ensure that
the browser ships .xhtml pages with Mime-Type: application/xhtml+xml,
or the MathML will not render correctly.  Apache does this.)

Some rendering defects will be apparent. In particular, square-root 
signs are botched and brackets around matrices and large formulae fail 
to stretch properly.  These are problems with the MathML display 
engine in Firefox, not the translation from eqn.

IMPLEMENTATION

The strategy used is very simple and relies on the fact that the box
models of eqn and Presentation MathML differ in only trivial ways.  It
leaves the grammar and existing internal object structures unchanged.
A new global, output_format, is defined as an enumerated type with
values {troff, mathml}.  Most of the functions and methods that emit
actual output acquire a top-level conditional, dispatching on this
global, which has one arm for troff mode and one for MathML mode.  In
most cases the MathML arm is drastically simpler.

(This strategy could be easily generalized to support other output
formats.  TeX is a possibility that leaps to mind.)

The only even moderately tricky changes are in the lexer.  Some of the 
predefined macros used constructs like up, down, fwd, back, and vcenter 
that have no equivalents in MathML.  I attacked this problem in these ways:

1. I eliminated three uses of 'back' to compose characters in favor
   of using equivalent groff specials \(<< \(>> \(<> that did not
   exist when these macros were written.  (This will be a quality 
   improvement for troff users.)

2. I eliminated one use of vcenter by using \\(md. (Likewise...)

3. I then split the table of pre-definitions in three; one large common
   table and two small troff-specific and MathML-specific tables.  Use
   of troff-only operations (up, down, back, fwd, vcenter) is now confined
   to the former. The latter now uses 'size big' and drops out the explicit 
   positioning operations, counting on MathML processors to do them.

POTENTIAL TROUBLE SPOTS:

Here are notes for reviewers on places I'm not 100% sure I've done 
the right thing:

* In the process of preparing the troff table, I translated three 
  definitions (dot_def, dotdot_def, and utilde def) that previously
  used explicit \v escapes to use 'up' and 'down' instead. I modeled
  the new definitions on the way vec and dyad work, but it's possible
  I got something subtle wrong.

* I'm not certain the MathML implementation of font_box::output() is
  right, because I don't quite get what the switcheroo between 
  current_roman_font and old_roman_font is supposed to accomplish.
  It does seem to generare good MathML, though.

Finally, I made one purely cosmetic change in text.cpp; I replaced
with an enum some magic numbers for spacing types that I thought 
were too ugly to live.

REMAINING ISSUES:

The entirety of eqn is translated when -TMathML is specified, 
with the following exceptions...

Limitations that cannot be fixed include non-support for special,
up/down/fwd/back, and vcenter.  

Limitations that might be fixable include non-support for
mark and lineup.  I will investigate further, but if these can
be implemented at all it's going to be in a very complicated and
nasty way.  This relatively clean and easy patch should go in first.

The way character boxes are output means that each digit of a 
multi-digit number gets its own <mn></mn> tag pair in the MathHTML.
While this is not technically wrong for Presentation MathML, it is
ugly and inefficient. Fixing this will require implementing a little
state machine in the text.cpp output method.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Attachment: MATHML.DIFF
Description: Text document

Attachment: mathmltest.py
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]