groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (off topic?) Docbook? Re: manlint?


From: John Gardner
Subject: Re: (off topic?) Docbook? Re: manlint?
Date: Fri, 18 Sep 2020 01:18:06 +1000

To preserve metadata, or identify regions of semantic or structural
interest, write a preprocessor to delineate unprocessed roff(7) syntax with
device control functions:

.TH \X'meta: begin title'TITLE\X'end title'


Which comes out looking like this in troff's intermediate output:

x X meta: begin title
t TITLE
x X meta: end title


Which postprocessors can use if they have some reason to care about
semantic data.

Even if you only care about extracting abstract info instead of rendering a
document, there's no reason a postprocessor actually has to be a typesetter:

$ infer | troff | post-infer --extract-outline --xml ./outline.xml | grotty
| less


Of course, this would require infer to have prior knowledge of specific
macro packages, but I fail to see that being an issue. Moreover, infer can
also identify preprocessor markup, such as tables, pictures, equations, and
any other shite that's impossible to recognise in preprocessor output.

This is similar in spirit to what Werner Lemberg started with devtag.tmac,
which grohtml(1) already uses to identify numbered headings and section
titles, Personally, there's a lot more we could be doing with that same
technique.

— John

On Thu, 17 Sep 2020 at 23:24, Ingo Schwarze <schwarze@usta.de> wrote:

> Hi,
>
> John Gardner wrote on Wed, Sep 16, 2020 at 03:25:54AM +1000:
> > Somebody wrote:
>
> >> this *isn't* the same to me: attributes are for metadata
> >> and tag contents are for data
>
> > That's what I mean: there isn't always an obvious distinction
> > between data and metadata.
>
> To get back a bit closer to the topic of the list: the roff(7)
> language and its macro sets actually face exactly the same challenge.
>
> In very broad strokes, mostly, metadata is supposed to be passed
> as arguments to requests.  For example: .ad .bp .ce .cp .di .ft .hy
> .in .it .ll .ls .mk .ne .nr .pl .po .ps .rj .rm .rt .sp .ss .ta
> .ti ...  Macro sets mostly follow: .HP .MT .PD .RS .RE .TP .UR ...
>
> Data (or text, in the case off roff) is supposed to be contained in,
> well, text lines.  Just like HTML elements can nest and contain
> text, so can roff requests introduce blocks of text lines, some
> to be closed by explicit end requests, some to automatically end
> when a certain condition is met.  There are fewer examples:
> .ce (again!) .di .rj ...
> Again, macro sets mostly follow: .EX .HP .MT .RS .TP .UR ...
>
> But in even fewer cases, the paradigm is violated by allowing
> arbitrary content in arguments, for example to avoid awkwardly long
> syntax, like in .do .if .nop ...
>
> I'm not aware of roff(7) requests turning text data into metadata
> (well, arguably expecting those where the fact that roff(7)
> is not only a markup language but a Turing-complete programming
> language makes it unavoidable, e.g. .while .de ... and the like).
>
>
> Then man(7) macros are very weird (which is one of the factors
> making them harder to use) in so far as they provide the feature of
> *optional* next line scope: for .B .I .SB .SM .SH .SS ...
> you can provide text data as arguments (like data as attributes
> in HTML) or on the next line (like element content in HTML).
> Besides, man(7) has at least one case where text data *must* be
> in an argument and even *precedes* metadata: .IP
>
>
> The mdoc macros are somewhat different; most of them take text data
> as arguments, so the whole concept is less pronounced there.
> Of course, the concept of blocks having content also exists,
> but there are four different kinds of blocks (with examples):
>
>  * multi-line blocks
>     - explicit (with start- and end macro)  .Bd/.Ed, .Bl/.El, ...
>     - implicit (end automatically)          .Sh .Ss .It ...
>  * in-line blocks
>     - explicit (with start- and end macro)  .Do/.Dc, .Oo/.Oc, ...
>     - implicit (end at the end of the line) .Dq,     .Op
>
>
> To summarize, it is useful to consider the distinction of metadata
> and data when designing a markup language and to mostly handle it
> in some systematic way, but there are very different ways to design
> such a system.  Also, experience teaches it is not possible to be
> strict about it, and zealously striving for rigidity in this respect
> is usually counter-productive, whereas totally disregarding the
> concept and assigning syntax in a completely random manner (as for
> example DocBook does it) isn"t good either.
>
> I would call the roff(7) language itself unusually well-designed
> in this particular respect (though it has of course other quirks).
> The man(7) macros feel below average, but that is somewhat mitigated
> by the extreme smallness of the language.  The mdoc(7) macros maybe
> under-emphasize the concept.  That is mostly inconsequential though
> because mdoc(7) generally discourages adding any metadata to the
> document whatsoever (except the for the macros themselves, of
> course).  In mdoc(7), you rarely need arguments: occasional -type
> -width -offset -compact, that's all, basically.
>
> Yours,
>   Ingo
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]