groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Building a troff parser


From: Ingo Schwarze
Subject: Re: [Groff] Building a troff parser
Date: Tue, 3 Mar 2015 16:29:33 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

Hi Eric,

Ralph Corderoy wrote on Tue, Mar 03, 2015 at 10:50:37AM +0000:
> Eric Andrew Lewis wrote:

>> $ explain "rm -rf *"
>> rm -rf *
>> ????????? rm       remove files or directories
>>     ????????? -r   remove directories and their contents recursively
>>     ????????? -f   ignore nonexistent files, never prompt

Here is what `man -Ttree rm` (which uses the mandoc(1) mdoc(7) and man(7)
parsers mentioned by Kristaps Dzonsons) given you on OpenBSD-current:

[...]
Sh (block) *42:2
  Sh (block-head) 42:2
      SYNOPSIS (text) 42:5
  Sh (block-body) 42:2
      Nm (block) *43:2
        Nm (block-head) 43:2
            rm (text) 43:5
        Nm (block-body) 43:2
            Op (block) *44:2
              Op (block-head) 44:2
              Op (block-body) 44:2
                  Fl (elem) 44:5
                      dfiPRr (text) 44:8
            Ar (elem) *45:2
                file (text) 45:2
                ... (text) 45:2
Sh (block) *46:2
  Sh (block-head) 46:2
      DESCRIPTION (text) 46:5
  Sh (block-body) 46:2
      The (text) *47:1
      Nm (elem) *48:2
      utility attempts to remove the non-directory type files specified on the 
(text) *49:1
      command line. (text) *50:1
[...]
      The options are as follows: (text) *55:1
      Bl (block) -tag -width [ [6n] ] *56:2
        Bl (block-head) 56:2
        Bl (block-body) 56:2
[...]
            It (block) *59:2
              It (block-head) 59:2
                  Fl (elem) 59:5
                      f (text) 59:8
              It (block-body) 59:2
                  Attempt to remove the files without prompting for 
confirmation, (text) *60:1
                  regardless of the file's permissions. (text) *61:1
[...]
            It (block) *82:2
              It (block-head) 82:2
                  Fl (elem) 82:5
                      R (text) 82:8
              It (block-body) 82:2
                  Attempt to remove the file hierarchy rooted in each file argum
ent. (text) *83:1
                  The (text) *84:1
                  Fl (elem) *85:2
                      R (text) 85:5
                  option implies the (text) *86:1
                  Fl (elem) *87:2
                      d (text) 87:5
                  option. (text) *88:1
[...]
            It (block) *95:2
              It (block-head) 95:2
                  Fl (elem) 95:5
                      r (text) 95:8
              It (block-body) 95:2
                  Equivalent to (text) *96:1
                  Fl (elem) *97:2
                      R (text) 97:5
                  . (text) 97:7
[...]

Of course, do not use the -Ttree textual representation, use the
underlying AST that consists of interlinked C structs, documented
here:  http://mdocml.bsd.lv/man/mandoc.3.html

So, you could relatively easily extract the following from the
above:

  rm -rf *
    rm  The rm utility attempts to remove the non-directory type files
        specified on the command line.
    -r  Equivalent to -R.
    -f  Attempt to remove the files without prompting for confirmation,
        regardless of the file's permissions.  If the file does not exist,
        do not display a diagnostic message or modify the exit status
        to reflect an error.

Obviously, understanding the redirection from -r to -R is much harder
for a program than for a human, but certainly feasible.

>>     ????????? *    Remove (unlink) files matching this text pattern.

> (The `text pattern' is actually a `glob', and it's expanded by the
> shell, not rm.  Might be worth it getting that point across.)

Indeed.  that's *much* harder because that's not explained in the
rm(1) manual but in sh(1) and glob(3), and rm(1) doesn't even contain
a cross reference to these manuals.

>> Why is doclifter the wrong tool for mdoc(7)? doclifter's documentation
>> states it supports mdoc(7).

> Ingo's saying that because there's existing code to specifically handle
> mdoc parsing AIUI.

Exactly, and in particular since doclifter does a conversion to the
DocBook format which you don't need and may have to reparse, while
mandoc directly gives you a real mdoc(7) AST.

Besides, i would expect the the mandoc mdoc(7) parser to be more robust
and more faithful than the doclifter one.  doclifter is simply
adressing a different task: conversion to DocBook.

On the other hand, the doclifter man(7) parser may be better than the
mandoc man(7) parser, in particular if you want to do semantic
analysis, simply because man(7) is a presentational rather than
a semantic language and doclifter contains some "baby AI" to
generate rudimentary semantic enrichment, which mandoc does not.

For man(7), mandoc just gives you a raw man(7) AST, which barely
contains any semantic information.  For example, this is current
mandoc on Debian:

 $ ./mandoc -Ttree /usr/share/man/man1/rm.1.gz
[...]
SH (block) *5:2
  SH (block-head) 5:2
      SYNOPSIS (text) 5:5
  SH (block-body) 5:2
      B (elem) *6:2
          rm (text) 6:4
      [\fIOPTION\fR]... \fIFILE\fR... (text) *7:1
SH (block) *8:2
  SH (block-head) 8:2
      DESCRIPTION (text) 8:5
  SH (block-body) 8:2
      This manual page (text) *9:1
      documents the GNU version of (text) *10:1
      BR (elem) *11:2
          rm (text) 11:5
          . (text) 11:8
      B (elem) *12:2
          rm (text) 12:4
      removes each specified file.  By default, it does not remove (text) *13:1
      directories. (text) *14:1
[...]
SH (block) *29:2
  SH (block-head) 29:2
      OPTIONS (text) 29:5
  SH (block-body) 29:2
      PP (block) *30:2
        PP (block-head) 30:2
        PP (block-body) 30:2
            Remove (unlink) the FILE(s). (text) *31:1
      TP (block) *32:2
        TP (block-head) 32:2
            \fB\-f\fR, \fB\-\-force\fR (text) *33:1
        TP (block-body) 33:1
            ignore nonexistent files, never prompt (text) *34:1
[...]
      TP (block) *58:2
        TP (block-head) 58:2
            \fB\-r\fR, \fB\-R\fR, \fB\-\-recursive\fR (text) *59:1
        TP (block-body) 59:1
            remove directories and their contents recursively (text) *60:1

Obviously, that kind of a syntax tree is much harder to interpret
than an mdoc(7) one.

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]