[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Axiom-developer] Design of Semantic Latex

From: Tim Daly
Subject: Re: [Axiom-developer] Design of Semantic Latex
Date: Sat, 27 Aug 2016 03:12:37 -0400

(email failed...retry)

William Sit wrote:

>I am still not sure what your immediate goal is. My current understanding (correct me
>if I am still wrong) is that you want to translate the left hand side of an identity (like a
>formula for an integral) given in latex into Axiom code which then, by executing the
>code, generates the right hand side of that identity, first by outputted as an Axiom
>_expression_ and then as selatex, and one difficulty is the lack of semantics implicit
>in the input (left hand side) latex string.

Well, the selatex output would only happen if you asked for it, but yes, this is correct.

>However, there are other difficulties: often, there is no canonical form for the right hand
>side (an answer after evaluating the left hand side),

If you look at the CATS Schaum's Integration set you'll see that each Axiom
integration result was subtracted from the published result, expecting it to simplify to 0.
(This is undecidable).

Where the answers differed they usually differed by a constant
(which was shown by taking the derivative of the difference). Sometimes special
simplification routines were needed, other times some rules had to be applied to get
some trig identities. More than once there was no resolution, which either indicates
a bug in Axiom, an error in the published result, or some hidden semantic assumption.

Even more interesting is that many of the reference books list many results.

>nor does Axiom always give the provisos under which the evaluation/identity is valid

One crisis at a time... Axiom has, for example, branch cut choices and these need to
be made explicit. Ideally, these could be changed by provisos. The SuchThat domain
provides some proviso functionality but is rarely used at present. However, the published
results don't provide the provisos either making the whole issue invisible despite
being important.

>For test suites, you won't have to worry about the canonical form or provisos because
>you are only comparing the answer with the one generated by a previous version of
>Axiom, to make sure nothing is "broken". For that purpose, the semantic you need only
>needs to be consistent from one Axiom version to the next, and you may choose the
>specific parameters needed in any way where the identity makes sense.

The regression tests use exactly this philosophy. But the CATS tests are a
"Computer Algebra Test Suite", not an Axiom test suite so they are measured against
published results. On several occasions Axiom has found errors in the published results.

>The general problem (which I am not sure if you are pursuing) where one wants to add
>explicit semantic to any mathematical _expression_ given in latex is a far more challenging
>one, independent of whether the _expression_ can or should be evaluated, or semantic
>provided for the rewritten _expression_ (or "answer"). I wonder if the general problem has
>applications. Would such mark-ups help or hinder the creation of a piece of mathematical

In general I don't think published mathematics will adopt semantic markup. The context
available in the surrounding paragraphs is sufficient. Most formulas are just there
to make the text statements exact.

However, for reference works that have no surrounding paragraphs like the CRC/NIST/etc.
the loss of paragraphs makes the formulas ambiguous. The E=mc^2 formula has no
meaning if you don't know what E, m, and c are.

Reference works lack grounding. This is an effort to provide semantic grounding by
showing that the formula is backed by algorithms that can recover the "results".
Is that useful?

The CATS test suite shows that Axiom has problems which need to be solved.
It also shows that the reference works have published errors. Both efforts benefit.

Is it of interest? Apparently so. Every so often someone tries to parse latex with the
goal of automation or inter-system communication. The natural source of latex
formulas are the reference works. The parsing effort fails every time based
on the lack of semantics. This effort addresses the "root cause", making the
reference works much more useful.

On Sat, Aug 27, 2016 at 3:08 AM, Tim Daly <address@hidden> wrote:
(email failed... retry)

The referenced paper only looked at the DLMF Airy functions. The results are:

"Of the 17 sections in the DLMF sample chapter on Airy functions we can handle the mathematical

formulas completely in 6, partially in 5 (without the transformation to MathML), and 6 remain


The grammar currently contains approximately 1000 productions, of which ca. 350 are dictionaries.

There are about 550 rewrite rules. There are fewer rewrite rules than grammar rules, partly because

dictionaries can be treated uniformly by manipulating literals, and partly because it is still incomplete

with respect to the grammar.

Our project shows that parsing mathematics in the form of LATEX documents written to project-specific

rules is feasible, but due to the variations in notation the grammar needs to be engineered specifically

for the project, or even for different chapters written by different mathematicians (e.g. the chapter on

elementary transcendental functions and on Airy functions)."

Many have tried to parse DLMF but there is not sufficient information in the latex. This 

effort used many rules to try to decide if w(x+y) is a multiplication or a function application.

There are rules to decide if sin a/b means (sin a)/b or sin(a/b), either of which is trivial to

distinguish if the latex read {sin a}/b or sin{a/b}

Trivial amounts of semantic markup in the DLMF would communicate semantics without using

1000 productions which still get wrong answers.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]