[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On XML, DocBook, data, metadata, and lexica

From: G. Branden Robinson
Subject: On XML, DocBook, data, metadata, and lexica
Date: Wed, 16 Sep 2020 05:01:15 +1000
User-agent: NeoMutt/20180716

At 2020-09-15T17:11:33+0200, Marc Chantreux wrote:
[John Gardner wrote:]
> > XML is verbose, cumbersome to read and write, and has two different
> > ways to express data structures
> > <doc title="My stupid document" enabled="true"></doc>
> > <doc>
> >     <title>My stupid document</title>
> >     <enabled>true</enabled>
> >     <enabled />
> > </doc>
> this *isn't* the same to me: attributes are for metadata and tag
> contents are for data. but the thing is: because xml is so painful
> to edit, people (including me) started to abuse metadata.

I want to amplify (possibly to the point of distortion) John's response
to this (not quoted here).

From a certain perspective this is a perfectly natural thing to do.
Once you've drunk enough computer science Kool-Aid you start to get
really comfortable with the interchangeability of instructions and data,
data and metadata, bits and...other bits.

The bifurcation of computer architectures into Harvard and von
Neumann[1] designs from the dawn of digital computing, recapitulated in
recent decades with split I/D caches and the "NX bit" in x86 processors,
illustrates how this distinction is a matter of policy rather than any
inherent properties of a stream of bits.

The thing about policy is that you have to enforce it.  And thence comes
the chaos of XML name spaces.

For me, quite apart from any issues of cleanliness of design, DocBook is
just too huge lexically.  I can't keep it all in my head or be sure I'm
using the right tag for a thing.  Texinfo has this problem, too.
Recently I made a commit to our manual changing several uses of @code{}
to @command{}.  There are probably dozens of other uses of @code{} that
need to be changed to yet a third, more-specific tag.

mdoc stumps me for a similar reason; the lexicon is large and on top of
that, mnemonically challenging since it confines itself to a
two-character name space, and even goes so far as to flaunt this
confinement as a feature by trumpeting the distinction between .Dl and
.D1 macros, predicating a semantic distinction on two of the most
confusable glyphs in the Latin character set in an environment where
contextual clues are absent except to already-expert users.

What I like about man(7) is the contrast.  The lexicon is small, and
that advantage is magnified by the fact that most people writing it do
so on an infrequent basis.

While I have notions of enhancement of man(7), inspired by Russ Cox's
recent changes to Plan 9 troff[2], I regard every user-visible macro as
costly.  A macro that we tell people to use has to deliver a lot of
value to earn a place in the lexicon.


[1] Not too long ago I found out that the latter appears to be yet
another example of Stigler's Law of Eponymy.

"In 1945, while ENIAC was still under construction, von Neumann produced
a draft report, mentioned previously, setting out the ENIAC group's
ideas for a stored-program computer, the EDVAC ('First Draft of a Report
on the EDVAC', first published in full in Stern, N. From ENIAC to
UNIVAC: An Appraisal of the Eckert-Mauchly Computers Bedford, Mass.:
Digital Press (1981), pp. 181-246). The EDVAC was completed six years
later, but not by its originators, who left the Moore School to build
computers elsewhere.

Von Neumann was a prestigious figure and he made the concept of a
high-speed stored-program digital computer widely known through his
writings and public addresses. As a result of his high profile in the
field, it became customary, although historically inappropriate, to
refer to electronic stored-program digital computers as 'von Neumann


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]