emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] bug in odt export via mathml of equations containing '&'


From: Jambunathan K
Subject: Re: [O] bug in odt export via mathml of equations containing '&'
Date: Wed, 09 Nov 2011 02:47:36 +0530
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.91 (windows-nt)

Hello Myles

The example that you have cited encounters issues on every step along
the way - plastex, mathtoweb and odt.

I have tried my best to be useful here. 

I sincerely appreciate you exercising the LaTeX to MathML conversion
facilities included in Org. I hope we get robust LaTeX->MathML
converters *ultimately*. I see that there is plenty of scope for the
LaTeX to MathML converters to improve and mature.

This is going to be a long mail. Read on.

#+TITLE:     improvements.org
#+AUTHOR:    Jambunathan K
#+EMAIL:     address@hidden
#+DATE:      2011-11-09 Wed
#+DESCRIPTION:
#+KEYWORDS:
#+LANGUAGE:  en
#+OPTIONS:   H:3 num:t toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t
#+OPTIONS:   TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:not-in-toc

#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+LINK_UP:   
#+LINK_HOME: 
#+XSLT:


* Improvements to LaTeX->MathML handling in ODF exporter

Firstly, I felt a need for some support infrastructure for working with
LaTeX fragments in the ODT exporter. I am listing few things that I have
added since our last interaction:

1. Dvipng images & Math formulae created from LaTeX fragments will now
   have the LaTeX fragment as metadata. i.e., In LibreOffice you can see
   the LaTeX source by Image/Equation->Right Click->Description

2. New interactive commands - M-x org-export-as-odf and M-x
   org-export-as-odf-and-open. With these commands you can mark a latex
   fragment and export it as a odf - OpenDocument formula -
   document. The MathML source will be available as part of kill ring
   after the export. (See the docstrings)

3. Embed OpenDocument formula within the exported document by providing
   a link to *.mathml or *.odf file as below.

#+CAPTION: cases with MathJaX
   [[./mathjax-cases.odf]]

   A link with neither caption or nor label will formatted inline type
   while one either or both of these attributes will be formatted as
   display.
> If an org file contains a latex equation with a '&' in it then when it is
> exported to odt it makes dodgy xml.  Unzipping the odt, opening the
> content.xml and doing M-x rng-first-error gives the message:
>
> `&' that is not markup must be entered as `&amp;'
>
> To reproduce, insert this:
>
> \begin{equation}
> \delta_{mn} = 
>  \begin{cases}
>   1& \text{if $n=m$}\\
>   0& \text{if $n\nem$}
>  \end{cases}
> \end{equation}
>
> (which I got from here
> http://www.mathtoweb.com/cgi-bin/mathtoweb_users_guide.pl , search for
> 'cases')
>
> into the file math-to-web-with-plastex.org in this post:
> http://permalink.gmane.org/gmane.emacs.orgmode/48815 and export as per
> instructions.
>
> There may be a similar error with equations containing '<', '>'.

#+TITLE:     diagnosis.org
#+AUTHOR:    Jambunathan K
#+EMAIL:     address@hidden
#+DATE:      2011-11-09 Wed
#+DESCRIPTION:
#+KEYWORDS:
#+LANGUAGE:  en
#+OPTIONS:   H:3 num:t toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t
#+OPTIONS:   TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:not-in-toc

#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+LINK_UP:   
#+LINK_HOME: 
#+XSLT:

#+STARTUP: hideblocks


* Diagnosis

1) User provided LaTeX fragment
   #+begin_src latex
     \begin{equation}
     \delta_{mn} = 
      \begin{cases}
       1& \text{if $n=m$}\\
       0& \text{if $n\nem$}
      \end{cases}
     \end{equation}
   #+end_src
2) Output from Plastex

   Note that plastex output includes a SPACE in two instances:
   - within `\text{blah}'
   - within `m$'

   #+begin_src latex
     \begin{equation}  \delta _{mn} = \begin{cases}  1&  \text {if $n=m$}\\ 0&  
\text {if $n\nem $} \end{cases} \end{equation} 
   #+end_src
  #+begin_src latex
    \begin{equation}  \delta _{mn} = \begin{cases}  1&  \text{if $n=m$}\\ 0&  
\text{if $n\nem$} \end{cases} \end{equation} 
  #+end_src
3) Output from MathWeb
   The extraneous SPACE is unacceptable to MathWeb and it complains. 

   #+begin_src text
     Checking Syntax:  **
    
         -- found 1 syntax error(s) --
    
       **  (em) Nesting Error:  $..$ can not be
                                nested inside \begin{equation}..\end{equation}
                                unless it is within a \text{..} environment. 
    
           line:  1            \begin{equation}  \delta _{mn} ...
                               ^
   #+end_src

   If I remove the extraneous SPACEs by hand MathToWeb crashes.

   #+begin_src text
     Checking Syntax:  ***    -- no errors --
    
     >> stand-alone math environments: [1] Converting:  1Exception in thread 
"Thread-3" java.lang.ArrayIndexOutOfBoundsException: 1
             at 
MathToWeb.convertEachLatexMatrixToAMathMLExpression(MathToWeb.java:15511)
             at MathToWeb.doMatrixConversions(MathToWeb.java:3495)
             at MathToWeb.convertLatexToMathML(MathToWeb.java:2106)
             at 
ConvertLatexToMathMLThread.run(ConvertLatexToMathMLThread.java:64)
   #+end_src

   Question: Is the snippet output from plastex a valid LaTeX?
   Depending on the answer there is a bug in either plastex or
   mathtoweb.

   The moral of the story is that pre-processsing the LaTeX fragment
   with plastex - while it may help with circumventing ncf limitations
   of MathToWeb - may create side-effects which will be allergic to
   MathToweb.

4) How ODT handles LaTeX->MathML failures
   If ODT didn't receive a <math>...</math> element it assumes failure
   and tries to embed the LaTeX fragment verbatim in to the exporter.

   There was a bug in embedding LaTeX fragment as plain text in the
   ODT file which you have reported as below. This I have fixed.

   #+begin_src text
     > If an org file contains a latex equation with a '&' in it then when it is
     > exported to odt it makes dodgy xml.  Unzipping the odt, opening the
     > content.xml and doing M-x rng-first-error gives the message:
     >
     > `&' that is not markup must be entered as `&amp;'
     > There may be a similar error with equations containing '<', '>'.
   #+end_src

* Some comments on "cases"
** Bug in MathToWeb wrt cases

   A comparison of [fn:1] and [fn:2] and a little experimentation with
   LibreOffice shows following issues with MathToWeb handling of
   \beign{cases}...\end{cases} which is allergic to LibreOffice.

   1. MathJax uses: 
      - <mfenced open="{" close="">...</mfenced> - while
      mathtoweb uses:
      - <mo>&#x0007B;</mo> and <mphantom> &#x0007D; </mphantom>
   2. MathJax the scope of <mtext>...</mtext> to just the "if" while
      MathToWeb extends the scope to the entire "sub-equation"
   3. MathJax uses &#xA0; while MathToWeb uses &#x000A0; for
      non-breaking space.

   If 1, 2 and 3 are "hand-fixed" in MathToWeb output then LibreOffice
   not only opens the MathToWeb produced formula fine but also
   displays it correctly[fn:3].

** A near-equivalent of MathToWeb's cases that is LibreOffice-friendly
   
   The below snippet is near-equivalent of "cases" formulation which
   is also LibreOffice-friendly. See the attached
   "workable-alternative-to-cases.odf".

   #+srcname: workable-alternative-to-cases
   #+begin_src latex
     \begin{equation*}
       \delta_{mn} = 
       \left\{
       \begin{smallmatrix}
         1 & \text{if } n=m \\
         0 & \text{if } n\nem
       \end{smallmatrix}
       \right\}
     \end{equation*}
   #+end_src

   The best alternative in LaTeX would be to use the \left\{ and
   \right. (note the "dot") construct as below[fn:4]. Unfortunately
   MathToWeb fails miserably while MathJax succeeds with flying
   colors.

   #+srcname: exact-equivalent-of-cases
   #+begin_src latex
     \begin{equation*}
       \delta_{mn} = 
       \left\{
       \begin{smallmatrix}
         1 & \text{if } n=m \\
         0 & \text{if } n\nem
       \end{smallmatrix}
       \right.
     \end{equation*}
   #+end_src

* Workarounds

** Use plastex with discretion and consider MathJax as a potential option?

   An example scenario where it creates undesirable side-effects has
   been seen earlier. 

   Interestingly, the original latex fragment DOES NOT rely on any
   user-defined newcommand for interpretation and can be passed on to
   MathToWeb directly.

   When the original snippet is exported with M-x
   org-export-as-odf-and-open RET, export to odf happens fine but
   LibreOffice fails to open the resulting formula[fn:1]. Also see the
   attached file "mathtoweb-cases.odf"

   If I open the resulting odf file and overwrite "content.xml" with
   the MathML produced by MathJax[fn:2][fn:5][fn:6] - see the attached
   "mathjax-cases.odf" - LibreOffice is happy.

** Provide the MathML or OpenDocument formula directly in the Org file
   
   One can provide the "right" MathML or OpenDocument formula directly
   in the Org file. The formula could either be created with
   LibreOffice's StarMath directly or by using the output from LaTeX
   to MathML converters as a first cut [fn:7] and improving the
   results subsequently with LibreOffice.


* Footnotes

[fn:1] #+srcname: output-from-mathtoweb-for-cases
#+begin_src nxml
  <?xml version="1.0" encoding="UTF-8"?>
  <math xmlns="http://www.w3.org/1998/Math/MathML";>
    <mrow>
      <mspace width="1.00em" />
      <msub>
        <mi>&#x003B4;</mi>
        <mrow>
          <mi>m</mi>
          <mi>n</mi>
        </mrow>
      </msub>
      <mo>=</mo>
      <mrow>
        <mo>&#x0007B;</mo>
        <mtable class="m-cases" columnalign="left">
          <mtr>
            <mtd>
              <mn>1</mn>
            </mtd>
            <mtd>
              <mtext>if&#x000A0;
              <math xmlns="http://www.w3.org/1998/Math/MathML";>
                <mrow>
                  <mi>n</mi>
                  <mo>=</mo>
                  <mi>m</mi>
                </mrow>
              </math>
              </mtext>
            </mtd>
          </mtr>
          <mtr>
            <mtd>
              <mn>0</mn>
            </mtd>
            <mtd>
              <mtext>if&#x000A0;
              <math xmlns="http://www.w3.org/1998/Math/MathML";>
                <mrow>
                  <mi>n</mi>
                  <mo>&#x02260;</mo>
                  <mi>m</mi>
                </mrow>
              </math>
              </mtext>
            </mtd>
          </mtr>
        </mtable>
        <mphantom>
          &#x0007D;
        </mphantom>
      </mrow>
    </mrow>
  </math>
#+end_src

[fn:2] #+srcname: output-from-mathjax-for-cases
#+begin_src nxml
  <math xmlns="http://www.w3.org/1998/Math/MathML"; display="block">
    <msub>
      <mi>&#x03B4;<!-- δ --></mi>
      <mrow>
        <mi>m</mi>
        <mi>n</mi>
      </mrow>
    </msub>
    <mo>=</mo>
    <mfenced open="{" close="">
      <mtable columnalign="left left" rowspacing=".1em" columnspacing="1em">
        <mtr>
          <mtd>
            <mn>1</mn>
          </mtd>
          <mtd>
            <mtext>if&#xA0;</mtext>
            <mrow>
              <mi>n</mi>
              <mo>=</mo>
              <mi>m</mi>
            </mrow>
          </mtd>
        </mtr>
        <mtr>
          <mtd>
            <mn>0</mn>
          </mtd>
          <mtd>
            <mtext>if&#xA0;</mtext>
            <mrow>
              <mi>n</mi>
              <mtext mathcolor="red">\nem</mtext>
            </mrow>
          </mtd>
        </mtr>
      </mtable>
    </mfenced>
  </math>
#+end_src

[fn:3] Would you like to this forward as a bug report to MathToWeb
team?

[fn:4] For LaTeX, see the last example here:
http://www.maths.tcd.ie/~dwilkins/LaTeXPrimer/Matrices.html

[fn:5] MathJax doesn't seem to handle \ne well.

[fn:6] Is there a command-line interface to MathJax? This will permit
  MathJax as a potential alternative to MathToWeb. If there is no
  command-line converter, can someone reverse-engineer the MathJax
  javascript and see what magic it does over the network or cloud.

[fn:7] In case of matrices, MathToWeb produces a MathML which displays
   fine save for some characters that are displayed as "questions".
Jambunathan K.

-- 

Attachment: diagnosis.org
Description: Text Data

Attachment: improvements.org
Description: Text Data

Attachment: workable-alternative-to-cases.odf
Description: application/vnd.oasis.opendocument.formula

Attachment: mathjax-cases.odf
Description: application/vnd.oasis.opendocument.formula

Attachment: mathtoweb-cases.odf
Description: application/vnd.oasis.opendocument.formula


reply via email to

[Prev in Thread] Current Thread [Next in Thread]