[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LYNX-DEV Lynx vs XML?
Christopher R. Maden
Re: LYNX-DEV Lynx vs XML?
Wed, 12 Mar 1997 14:58:54 GMT
> I am just starting to find serious articles discussion the next
> generation markup language, XML, around in various public journals.
> I was wondering if anyone on this mailing list has begun thinking
> about the potential impacts this may have on lynx.
I've been spending a good portion of my waking hours thinking about
it. I've been heavily involved in the development of XML, and Lynx
has been near the top of my thoughts with it.
Unfortunately, I'm not sure how Lynx will be able to cope. The HTML
parser is buried so deep in error recovery routines and treeless
parsing that I don't think it's adaptable. I don't think I understand
the Lynx code thoroughly enough to make a suggestion with complete
confidence, but here goes. I believe that right now, Lynx turns the
text/html MIME type into an internal signal to process HTML using its
various routines for doing so. I think what should be done is add
another internal signal for text/xml or application/xml. That parser
should be a straight-ahead tree-based parser.
No one is going to be using XML for human-readable documents until the
stylesheet specification is done around the end of this year, so we
have some time. Eventually, here's what Lynx will need to do:
a) Parse the XML declaration:
<?XML version="1.0" encoding="iso8859-1" rmd="INTERNAL"?>
b1) If rmd="none", recognize and skip the doctype declaration, if
b2) If rmd="internal", parse the doctype declaration for entity
b3) If rmd="all", parse the doctype declaration for internal entity
declarations, then for the external subset, which must be fetched
and parsed for entity declarations as well.
bx) The linking specification is still in draft. However, whatever
part of the doctype declaration is required by the XML declaration
must also be parsed for hypertext link identification. This
information will assert that certain elements map to certain XML
link constructs. That mapping will be loaded into a table.
c) Identify and fetch the stylesheet(s) associated with the document.
There is currently no mechanism specified for doing this - I'm
making a proposal in about five minutes, though. Lynx will have
to learn to parse DSSSL[*], most likely. A simplified subset,
called dsssl-o, is likely to be the stylesheet language for XML.
d) Parse and render the document, using the linking table and the
stylesheet. External entities can probably be left as links, only
downloaded when the user requests them. Can the HT structure be
rebuilt on the fly when that happens? I don't think it would be
unreasonable to always treat external entities as an integer
number of lines - i.e., only whole lines would ever be inserted
into the rendered view.
I am more than happy to help with design, and with writing any new
code, but I don't feel comfortable playing with the existing code,
since I really haven't had the time to invest in understanding what's
More information about XML is available in the XML FAQ, at
[*] DSSSL is the Document Style Semantics and Specification Language,
ISO/IEC 10179:1996. DSSSL introduces the concept of a tree of flow
objects for rendering. A lot of these will be irrelevant to a
character-mode browser, like font information. Lynx will only need to
pay attention to larger-scale objects, like paragraphs. While re-
writing the rendering routines, we might be able to add table support.
Christopher R. Maden One Richmond Square
DynaText SIT Technical Support Providence, RI 02906 USA
Inso Corporation +1.401.421.9550 (voice)
Electronic Publishing Solutions +1.401.521.2030 (facsimile)
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
- LYNX-DEV Lynx vs XML?, Larry W. Virden, x2487, 1997/03/11
- Re: LYNX-DEV Lynx vs XML?,
Christopher R. Maden <=