gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: doc formats


From: Stephen J. Turnbull
Subject: Re: [Gnu-arch-users] Re: doc formats
Date: Sat, 21 Jan 2006 15:11:47 +0900
User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b24 (dandelion, linux)

>>>>> "Thomas" == Thomas Lord <address@hidden> writes:

    Thomas> Before replying to Miles and Matthew: If you want a good
    Thomas> example of wiki-style markup gone bad,

The claim of the XML proponents on python-dev is that this is
inevitable.  I'm sympathetic to that claim, having seen the devices
proposed for reST to address attaching semantic information to
plain text.

    Thomas> Can anyone either point to a document that concisely
    Thomas> explains the parsing theory (or in other terms, the state
    Thomas> machine) the ReST uses?  The name ("Regular expression
    Thomas> ...")

Don't ASSume.  The "re" in reStructured Text means "again", as in
Structured Text, version 2.

There is no parsing theory, as far as I know.  There is a set of
commonly heuristics, such as the use of underscore to mark links,
analogous to the use by gettext, and asterisks for emphasis, as often
used in email, underlining with hyphens or equality symbols to mark
headers, ASCII art tables, and indentation for block quotes which are
endorsed and formalized.  Email-style quoting is also recognized, as
well as prompt> input / (no-prompt) output.  Finally there are some
new heuristics such as the use of a doubled colon :: which turns a
following block quote into the equivalent of texinfo @example, and the
use of |id| to create abbreviations.  A pretty mixed bag, I think
you'd really have to strain to find a "theory" in there.  Certainly
nothing to compare to SGML or LISP syntactic theory.  The developers
do recognize the problems that come from using the same character to
mark the beginning and the end of a marked text, but that's about as
far as it goes.

Then there is a comment / explicit markup / extension syntax
introduced by a leading "..".  This also admits a "field syntax" which
is essentially RFC 822 headers, with tags marked by trailing and
leading colons.  Eg,

.. image:: dont-panic.png
   :height: 480
   :width: 640
   :background: transparent

    Thomas> Speaking of XML: I understand that ReST is table driven in
    Thomas> some way and therefore presumably highly reconfigurable.
    Thomas> I'd like to see that more aggressively developed.

reST is divided into a single front end allowing plugins for extension
syntax based on the field syntax, and back-ends driving markup engines
such as LaTeX, HTML, and I believe Docbook.  ISTR that the product of
the front end is indeed a syntax tree, but it's not as powerful as
XML, though it can produce XML.  As a consequence, the HTML and LaTeX
produced is far more human-readable than the horrors that, say,
latex2html produces.

    Thomas> I'd like to see a single parser generic parser engine
    Thomas> driven by a declarative language spec so that, for
    Thomas> example, in one case it's interpreting markup as
    Thomas> short-hand for XHTML but, in other case, it's interpreting
    Thomas> a similar markup syntax as short-hand for some other,
    Thomas> quite different DTD.

Attempts to get reST to do this are precisely what the XML advocates
on python-dev are terrified by.

    Thomas> (The issue here is that there is an extreme scarcity
    Thomas> natural-looking plain-text mark-ups but many DTDs we wish
    Thomas> to generate.  "By eye" people have no trouble
    Thomas> disambiguating some forms of overloading -- the doc parser
    Thomas> should accomodate that.)

That's right out, man.  The human eye, as in the phrase "by eye",
includes the 1.5 kilos or so of neural matter located at the other end
of the optic nerve.  The brain is a shitty parser; it can't even lex
very well.  "By eye" is 1% parsing, 99% semantic filtering.  In your
case, I'd be willing to go as high as 33%-66% (with 1% left over for
Edisonian inspiration), but no way would I concede that your ability
to recognize inline constructs "by eye" is as much as 50% based on
parsing.

Furthermore, at the block level the "eye" we're talking about is a
two-dimensional pattern recognizer.  AFAIK most parsing theory, and
all tools easily available to GNU developers, are based on parsing
unidimensional streams.  (Consider the progress of proprietary OCR vs
free OCR programs as symptomatic.)

    Thomas> Awiki also has some tricks

Urk!  Once you let in a trick or two, there goes the neighborhood!

    Thomas> to beat out ambiguities and to rely less on gratuitous
    Thomas> whitespace restrictions, afaict from the ReST docs.

The reST whitespace restrictions are artificial; this is not
surprising given it comes from Python developers, and one of its
initial target use cases was Python docstrings.

It's also amusing (interesting but probably not relevant to this
discussion) to note that with an Xft-enabled Emacs I find a reST
source buffer nearly as attractive as a formatted HTML buffer for
slide presentations to general audiences (such as undergraduate
classes).  HTML does a better job of placing images and tables are
distinctly more attractive, but it's actually possible to do lectures
in an Emacs buffer the way I would do them on the blackboard, line by
line (of course I use filladapt to handle the indentation, which it
does correctly for reST out of the box).

OTOH, there is no *ML that is satisfactory for technical
presentations yet; it's gotta be TeX.



-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]