Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script

From:	Marc-André Lureau
Subject:	Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script
Date:	Tue, 15 Nov 2016 17:17:40 +0000
Hi

On Thu, Nov 10, 2016 at 9:32 PM Markus Armbruster <address@hidden> wrote:

> Marc-André Lureau <address@hidden> writes:
>
> > As the name suggests, the qapi2texi script converts JSON QAPI
> > description into a texi file suitable for different target
> > formats (info/man/txt/pdf/html...).
> >
> > It parses the following kind of blocks:
> >
> > Free-form:
> >
> >   ##
> >   # = Section
> >   # == Subsection
> >   #
> >   # Some text foo with *emphasis*
> >   # 1. with a list
> >   # 2. like that
> >   #
> >   # And some code:
> >   # | $ echo foo
> >   # | -> do this
> >   # | <- get that
> >   #
> >   ##
> >
> > Symbol:
> >
> >   ##
> >   # @symbol:
> >   #
> >   # Symbol body ditto ergo sum. Foo bar
> >   # baz ding.
> >   #
> >   # @arg: foo
> >   # @arg: #optional foo
> >   #
> >   # Returns: returns bla bla
> >   #          Or bla blah
> >   #
> >   # Since: version
> >   # Notes: notes, comments can have
> >   #        - itemized list
> >   #        - like this
> >   #
> >   # Example:
> >   #
> >   # -> { "execute": "quit" }
> >   # <- { "return": {} }
> >   #
> >   ##
>
> The following discussion belongs to review of qapi-code-gen.txt, but the
> example here is more abstract and nicely visible, so let me use it.
>
> The "Symbol" kind of block as shown above is for QMP commands with
> arguments defined inline.
>
> It also serves for QMP events: just omit the return value.
>
> For struct types, the terminology is a bit off ("argument" instead of
> "member"), but the basic structure less the return value still works.
> No surprise, because a command is essentially a triple (name, arguments
> type, return value type), where the arguments type is a struct type (but
> see below for non-struct arguments).
>

Yeah, I stick with args for now, but we may want to rename it to
member/fields. Other suggestions?

>
> From there, it's a small step to using it for enum types.
>

Right, I realize it may be nice to enumerate the values, even if they are
not documented. I have made some changes in this direction in the generator.


> But how to do union types is non-obvious.  In their general form, they
> have common members, one of them the tag, and variant members depending
> on the tag value.  How do we want such types be documented?
>

Fair enough, union types are quite badly documented by current version of
generator. I have made some changes, so 'base' arguments and 'data'
alternative are shown that way:

BlockdevOptions (Union)

{ 'driver': BlockdevDriver, ['node-name': str], ['discard':
BlockdevDiscardOptions], ['cache': BlockdevCacheOptions], ['read-only':
bool], ['detect-zeroes': BlockdevDetectZeroesOptions] } + 'driver' = [
'archipelago': BlockdevOptionsArchipelago, 'blkdebug':
BlockdevOptionsBlkdebug,....



> Alternate types can be documented like structs, I think.
>

mostly, I have used the [] syntax to represent union & alternatives
(instead of {}). I hope that helps.


>
> What about QMP commands with arguments of the form 'data': 'TypeName'?
> We typically replace the whole @arg: part by something like
>
>     # For the arguments, see the documentation of BlockdevSnapshotSync.
>
> Note that with 'boxed': true, TypeName could even be union or alternate.
>
> My point is: the documentation needs work.  Whether we should do the
> within this series or on top is debatable.
>

I'd rather fix it on top, that series is already pretty large and dense,
and hopefully is a good starting point.

>
> > That's roughly following the following BNF grammar:
>
> EBNF, actually.
>
>
ok

>
> > api_comment = "##\n" comment "##\n"
> > comment = freeform_comment | symbol_comment
> > freeform_comment = { "#" text "\n" }
>
> As we'll see below, your code actually requires at least one space after
> "#" unless text is empty.
>

ok


> > symbol_comment = "#" "@" name ":\n" { freeform | member | meta }
>
> Your code actually requires exactly one space between "#" and "@".  It
> happily accepts additional text between : and newline.
>
> freeform is undefined.  I guess you mean freeform_comment.
>

ok


> With that corrected, the grammar is ambiguous.  Your intent is obvious
> enough, though.  Just not to a computer.
>
> > member = "#" '@' name ':' [ text ] freeform_comment
>
> Again, your code requires exactly one space after "#".
>
> > meta = "#" ( "Returns:", "Since:", "Note:", "Notes:", "Example:",
> "Examples:" ) [ text ] freeform_comment
>
> Likewise.
>

ok


> > text = free-text markdown-like, "#optional" for members
>
> Here, the grammar switches from EBNF to prose.
>
> As is, the grammar can serve as semi-formal user documentation.  It
> can't quite serve as specification for the parser.  As we'll see below,
> the code processing doc strings is not a parser constructed from this
> grammar.
>
> Why am I so hung up on constructing parsers systematically from
> grammars?  When you do that, you can derive the accepted language from
> the *grammar*.  Generally *much* easier than from ad hoc parser code.
> Moreover, a (sufficiently simple) grammar leads to a natural internal
> representation, namely an abstract syntax tree.
>
> But let's continue.
>
> > Thanks to the following json expressions, the documentation is enhanced
> > with extra information about the type of arguments and return value
> > expected.
>
> What exactly do you want to convey with this sentence?
>

Not much, that we use the json expressions to enhance the written doc.


>
> > Signed-off-by: Marc-André Lureau <address@hidden>
> > ---
> >  scripts/qapi.py        | 175 ++++++++++++++++++++++++++-
> >  scripts/qapi2texi.py   | 316
> +++++++++++++++++++++++++++++++++++++++++++++++++
> >  docs/qapi-code-gen.txt |  44 +++++--
> >  3 files changed, 524 insertions(+), 11 deletions(-)
> >  create mode 100755 scripts/qapi2texi.py
> >
> > diff --git a/scripts/qapi.py b/scripts/qapi.py
> > index 21bc32f..ed52ee4 100644
> > --- a/scripts/qapi.py
> > +++ b/scripts/qapi.py
>
> Warning: loads of comments ahead.  Doesn't mean there are loads of
> problems!  I need to write to think clearly, and I'm sharing my thinking
> to hopefully help us converge.
>
>
ok :)


> > @@ -122,6 +122,103 @@ class QAPIExprError(Exception):
> >              "%s:%d: %s" % (self.info['file'], self.info['line'],
> self.msg)
> >
> >
> > +class QAPIDoc(object):
> > +    def __init__(self, parser):
> > +        self.parser = parser
> > +        self.symbol = None
> > +        self.body = []
> > +        # args is {'arg': 'doc', ...}
>
> Rather terse, but I think I get what you mean.
>
> > +        self.args = OrderedDict()
> > +        # meta is [(Since/Notes/Examples/Returns:, 'doc'), ...]
>
> More so.
>
> > +        self.meta = []
> > +        # the current section to populate, array of [dict, key,
> comment...]
>
> Now you've reached my limit :)
>
> "array of [...]" sounds like a list of lists.  But you probably mean
> just a list.
>
> We'll see below that dict is either self.args or self.meta, and key is a
> key in dict.
>

I have introduced a nested Section class, hopefull that helps a bit


> > +        self.section = None
> > +        self.expr_elem = None
> > +
> > +    def get_body(self):
> > +        return "\n".join(self.body)
>
> Not worth a doc string?  The existing code doesn't have any, but since
> you provide them for other public methods of this class...
>
>
Not all of them, get_body() and end_section() seem rather obvious to me,
doc string could be redundant.

> +
> > +    def has_meta(self, name):
> > +        """Returns True if the doc has a meta section 'name'"""
> > +        return next((True for i in self.meta if i[0] == name), False)
>
> Less clever, but probably more obvious to readers lacking a degree in
> Python generators:
>
>            for i in self.meta:
>                if i[0] == name:
>                    return True
>            return False
>
> I'm not against using more advanced language features, but when the
> "advanced" code needs more tokens than the "retarded" one...
>

agree (I think I had something else in mind when I started writing that
line)


>
> > +
> > +    def append(self, line):
> > +        """Adds a # comment line, to be parsed and added in a section"""
> > +        line = line[1:]
>
> As we'll see below, @line is the full comment token text including the
> initial '#', but not including the final '\n'.  First thing we do with
> it is strip the initial '#'.  Makes me wonder why we pass it in.  It's
> not wrong, of course.
>
>
I don't see much point in changing this either


> Note that comments don't need to be full lines, they just happen to be
> used that way most of the time.  Admittedly contrived counter-example:
>
>     { 'enum': 'Empty', 'data': [] } ##
>                                     # @XY:
>                                     ##
>     { 'enum': 'XY', 'data': ['x', 'y'] }
>
> This makes the parser call doc.append("# @XY:").  The argument "# @XY:"
> isn't a full line in the source.  It becomes a line in the documentation
> text QAPIDoc wraps.
>
> > +        if len(line) == 0:
> > +            self._append_section(line)
> > +            return
>
> Special case for comment '#'.  Note that '# ' is processed as a normal
> case.
>
> > +
> > +        if line[0] != ' ':
> > +            raise QAPISchemaError(self.parser, "missing space after #")
> > +
> > +        line = line[1:]
> > +        # take the first word out
> > +        name = line.split(' ', 1)[0]
>
> For a comment '# ', @name becomes '', because ''.split(' ', 1) returns
> [''].  Works.
>
> For a comment '#  Since:' (note two spaces), name becomes '', because
> .split() returns ['', 'Since:'].  Thus, the keywords are only recognized
> exactly after '# '.  Is this a good idea?
>

Good for consistency, and there might be cases where arg version/note
documentation could legitimately (for now) look like:

##
# @foo:
# @arg: bla blah
#           Since: 2.8
##

> +        if name.startswith("@") and name.endswith(":"):
> > +            line = line[len(name):]
> > +            name = name[1:-1]
> > +            if self.symbol is None:
> > +                # the first is the symbol this APIDoc object documents
>
> Suggest
>
>                    # The first @NAME: is the symbol being documented
>

ok


>
> > +                if len(self.body):
> > +                    raise QAPISchemaError(self.parser, "symbol must
> come first")
>
> You and I know what 'symbol' means here, but the user should not need to
> know.  Suggest something like "'@%s:' must come first".  I'm not
> entirely happy with that either, but I don't have better ideas right
> now.
>

ok


>
> If the parser was constructed from your grammar, we'd deal with the
> first @NAME: not coming first differently.  According to the grammar, a
> doc comment block either starts with '##\n# @' (a symbol block), or it
> doesn't (a free-form block).  If the first line isn't a @NAME: line, but
> later lines are, the parser decides it's a free-from block right there.
> Any @NAME: lines it finds later would be parsed as free-form line.  If
> we want to reject such lines in free-form blocks, the easiest way would
> be a semantic check.
>

Ok, I'll move it do check_docs()

> +                self.symbol = name
>
> Here's how we can tell what kind of block we're working on:
>
> if .symbol is not None:
>     this is a symbol block
> elif .body is not None:
>     this is a free-form block
> else
>     we don't know, yet
>


ok, I have adapted it slightly to avoid some redundancy in the "we don't
know" and "free-form" cases


> > +            else:
> > +                # else an arg
>
> Suggest
>                    # Subsequent @NAME: are arguments and such
>

This goes away with above refactoring, since we know we are parsing args


>
> > +                self._start_args_section(name)
>
> The remainder of the line after @NAME: is left in @line, and will be
> stuffed into the current section or else the body after the conditional.
> That's what the grammar specifies for argument NAME, in rule member.
> It's not what it specifies for symbol NAME, in rule symbol_comment.
>
> In particular, when nothing follows symbol NAME, we still stuff '' into
> the body.  For instance,
>
>     ##
>     # @symbol:
>     #
>     # lorem ipsum
>     ##
>
> produces self.body = ['', '', 'lorem ipsum'], which .get_body()
> transforms into '\n\nlorem ipsum'.  Extra newline.  I guess it's all the
> same to Texinfo in the end.
>

Ok, let's make sure the first @symbol: doesn't contain extra characters.


>
> > +        elif self.symbol and name in (
> > +                "Returns:", "Since:",
> > +                # those are often singular or plural
> > +                "Note:", "Notes:",
> > +                "Example:", "Examples:"):
> > +            # new "meta" section
> > +            line = line[len(name):]
> > +            self._start_meta_section(name[:-1])
> > +
> > +        self._append_section(line)
> > +
> > +    def _start_args_section(self, name):
> > +        self.end_section()
> > +        if self.args.has_key(name):
> > +            raise QAPISchemaError(self.parser, "'%s' arg duplicated" %
> name)
>
> Where's self.args[name] added?  Oh, I see, it's hidden in end_section().
> Could it be added here instead?
>

How? self.args[name] = "" which will be overwritten in end_section? ok


>
> > +        self.section = [self.args, name]
> > +
> > +    def _start_meta_section(self, name):
> > +        self.end_section()
> > +        if name in ("Returns", "Since") and self.has_meta(name):
> > +            raise QAPISchemaError(self.parser, "'%s' section
> duplicated" % name)
>
> For the implementation, it's a section, but for the user, it's a line.
> Suggest "Duplicate '%s'".
>
> Comment on _start_args_section() applies.
>

ok


> > +        self.section = [self.meta, name]
> > +
> > +    def _append_section(self, line):
> > +        """Add a comment to the current section, or the comment body"""
>
> _append_to_section() or _section_append() would be clearer.  We're not
> appending a section, we're appending *to* a section.  Or to the body,
> which isn't a section.  Hmm.
>

ok, what about _append_freeform() ?

>
> > +        if self.section:
> > +            name = self.section[1]
> > +            if not name.startswith("Example"):
> > +                # an empty line ends the section, except with Example
> > +                if len(self.section) > 2 and len(line) == 0:
> > +                    self.end_section()
> > +                    return
> > +                # Example is verbatim
> > +                line = line.strip()
>
> The comment is confusing, because it explains what we're not doing
> elsewhere.  Comments should explain what we're doing here.
>
>
ok, I'll remove it


> > +            if len(line) > 0:
> > +                self.section.append(line)
> > +        else:
> > +            self.body.append(line.strip())
> > +
> > +    def end_section(self):
> > +        if self.section is not None:
>
> I prefer the more laconic "if self.section:", especially when
> self.section can't have a false value other than None anyway.  Even more
> I prefer making the fact that this does nothing when we don't have a
> section immediately obvious:
>
>            if not self.section:
>                return
>            do the work...
>
>
ok


> > +            target = self.section[0]
> > +            name = self.section[1]
> > +            if len(self.section) < 3:
> > +                raise QAPISchemaError(self.parser, "Empty doc section")
> > +            doc = "\n".join(self.section[2:])
> > +            if isinstance(target, dict):
> > +                target[name] = doc
> > +            else:
> > +                target.append((name, doc))
> > +            self.section = None
>
> Until the next _start_FOO_section(), self.section remains None.
> Therefore, any text between here and the next section will be appended
> to self.body.  This is going to mangle input that doesn't quite match
> expectations.  Example:
>
>     ##
>     # @frobnicate:
>     #
>     # Frobnicate some frobs.
>     #
>     # @frobs: bla, bla, explain @frobs,
>     #         bla.
>     #
>     #         Beware of the tiger!
>     #
>     #         Explain @frobs some more, bla, bla.
>     #
>     # @harder: frobnicate harder.
>     ##
>     { 'command': 'frobnicate',
>       'data': { 'frobs': 'any', '*harder': 'bool' } }
>
> Results in a QAPIDoc with
>
>     .symbol = 'frobnicate'
>     .args = { 'frobs': 'frobs bla, bla, explain,\nbla',
>               'harder': 'frobnicate harder.' }
>     .body = ['', '', 'Frobnicate some frobs.', '',
>              'Beware of the tiger!', '',
>              'Explain some more, bla, bla.', '']
>
> and the following .texi:
>
>     @deftypefn Command {} frobnicate @
>     {('frobs': @var{any}, ['harder': @var{bool}])}
>
>     @table @var
>     @item frobs
>     bla, bla, explain @var{frobs},
>     bla.
>     @item harder
>     frobnicate harder.
>     @end table
>
>
>     Frobnicate some frobs.
>
>     Beware of the tiger!
>
>     Explain @var{frobs} some more, bla, bla.
>
> The only way to detect the mistake is to read the generated
> documentation attentively.  We can't rely on everybody doing that during
> development.  Even when you do, the mangled output gives you no clue on
> the mistake you made, unless you know how the doc generator works.
>
> The generator could detect the mistake at the syntactical or at the
> semantic level.
>

I have changed the logic there, since in general paragraphs are indented
with @arg sections like your example above. So an empty line will no longer
break the current section, only an empty line followed by a non-indented
paragraph after an argument documentation will. That is

@foo: arg doc

Blah

Will break the argument section on "Blah", and append it to the
documentation body.

This fixes real doc issues from previous iterations, see @block-commit for
example.

More general question: can you think of a legitimate reason for the
> generator emitting documentation in a different order than it was
> written?
>

It would be nicer if the generator can freely reorder/display the
documentation, for consistency and style (different blocks could use
different style etc).


>
> If no, we should pick an internal representation that can represent the
> input we have / envisage to have faithfully, then reject input it can't
> represent faithfully.
>
> > +
> > +
> >  class QAPISchemaParser(object):
> >
> >      def __init__(self, fp, previously_included=[], incl_info=None):
> > @@ -137,9 +234,15 @@ class QAPISchemaParser(object):
>
> Let's see how parsing works.
>
> >          self.line = 1
> >          self.line_pos = 0
> >          self.exprs = []
> > +        self.docs = []
> >          self.accept()
> >
> >          while self.tok is not None:
> > +            if self.tok == '#' and self.val.startswith('##'):
> > +                doc = self.get_doc()
> > +                self.docs.append(doc)
> > +                continue
> > +
>
> Top-level doc comments get accumulated in list self.docs.
>
> >              expr_info = {'file': fname, 'line': self.line,
> >                           'parent': self.incl_info}
> >              expr = self.get_expr(False)
> > @@ -160,6 +263,7 @@ class QAPISchemaParser(object):
> >                          raise QAPIExprError(expr_info, "Inclusion loop
> for %s"
> >                                              % include)
> >                      inf = inf['parent']
> > +
> >                  # skip multiple include of the same file
> >                  if incl_abs_fname in previously_included:
> >                      continue
> > @@ -171,12 +275,40 @@ class QAPISchemaParser(object):
> >                  exprs_include = QAPISchemaParser(fobj,
> previously_included,
> >                                                   expr_info)
> >                  self.exprs.extend(exprs_include.exprs)
> > +                self.docs.extend(exprs_include.docs)
>
> The include's list of docs are spliced in.
>
> >              else:
> >                  expr_elem = {'expr': expr,
> >                               'info': expr_info}
> > +                if len(self.docs) > 0:
> > +                    self.docs[-1].expr_elem = expr_elem
> >                  self.exprs.append(expr_elem)
>
> A non-include expression gets tied to the last doc comment.  This isn't
> quite right.  Example:
>
>     ##
>     # @Empty:
>     ##
>     { 'enum': 'Empty', 'data': [] }
>
>     { 'enum': 'XY', 'data': ['x', 'y'] }
>
> The doc comment's expr_elem gets first set to the first enum expression,
> then overwritten with the second one.
>

good catch, regression from last iteration


> Less serious, but still somewhat weird:
>
>     ##
>     # Lorem ipsum
>     ##
>
>     # @Empty:
>     { 'enum': 'Empty', 'data': [] }
>
> The first comment block is a doc comment.  The second one isn't, and
> gets ignored.  The first one's expr_elem gets set to the enum
> expression.
>
> The quick fix is to set .expr_elem only when it's still None.
>
> I'll take the quick fix


> The thorough fix is to integrate doc comments deeper into the syntax.
>
>
as you know, my initial goal was not to bring a strict doc parser (many doc
parser are not that strict and work well enough), rather to just generate
something good enough out of the json to get rid of txt duplication.
Further improvements can be made after this series hopefully.


> >
> > -    def accept(self):
> > +    def get_doc(self):
>
> Here, we know self.tok == '#', and self.val is thus the comment line
> less the newline (see accept()).
>
> > +        if self.val != '##':
> > +            raise QAPISchemaError(self, "Doc comment not starting with
> '##'")
>
> It starts with '##' alright, it just happens not to end there.  What
> about "Junk after '##'"?  Or, if that's too laconic, perhaps "Junk after
> '##' at start of documentation comment".
>

ok


> > +
> > +        doc = QAPIDoc(self)
> > +        self.accept(False)
> > +        while self.tok == '#':
> > +            if self.val.startswith('##'):
> > +                # ## ends doc
>
> "# ##" looks awkward.  Suggest
>
>                    # End of doc comment
>
>
ok


> > +                if self.val != '##':
> > +                    raise QAPISchemaError(self, "non-empty '##' line %s"
> > +                                          % self.val)
>
> Let's make the message similar to whatever message we emit for junk
> after the initial ##.
>

ok


>
> > +                self.accept()
> > +                doc.end_section()
> > +                return doc
> > +            else:
> > +                doc.append(self.val)
> > +            self.accept(False)
> > +
>
> Here, self.tok is whatever follows the doc comment.
>
> > +        if self.val != '##':
> > +            raise QAPISchemaError(self, "Doc comment not finishing with
> '##'")
>
> This check is incorrect.  For instance, input
>
>     ##
>     '## gotcha'
>
> yields
>
>     /dev/stdin:2:1: Doc comment not finishing with '##'
>
> because self.tok == "'" and self.val == "## gotcha".
>
> Either fail unconditionally here (then the final ## is required), or do
> nothing (then it's optional).  Your commit message suggests the former.
>
> ok


> > +
> > +        doc.end_section()
> > +        return doc
> > +
> > +    def accept(self, skip_comment=True):
> >          while True:
> >              self.tok = self.src[self.cursor]
> >              self.pos = self.cursor
> > @@ -184,7 +316,13 @@ class QAPISchemaParser(object):
> >              self.val = None
> >
> >              if self.tok == '#':
> > +                if self.src[self.cursor] == '#':
> > +                    # ## starts a doc comment
>
> Suggest
>
>                        # Start of doc comment
>
>
ok


> > +                    skip_comment = False
> >                  self.cursor = self.src.find('\n', self.cursor)
> > +                self.val = self.src[self.pos:self.cursor]
>
> Puts the complete line less the newline into self.val.  I wonder whether
> we should strip the initial '#' as well.
>
> > +                if not skip_comment:
> > +                    return
>
> Make that
>
>                    if not skip_comment:
>                        self.val = self.src[self.pos:self.cursor]
>                        return
>
> ok


> >              elif self.tok in "{}:,[]":
> >                  return
> >              elif self.tok == "'":
>
> Comment tokens are thrown away as before, except when the parser asks
> for them by passing skip_comment=False, or when the comment token starts
> with ##.  The parser asks while parsing a doc comment, in get_doc().
>
> This is a backchannel from the parser to the lexer.  I'd rather avoid
> such lexer hacks, but I guess we can address that on top.
>
>
Alternative I imagine is to return comments from the lexer, I don't mind if
you prefer it that way, let me know

A comment starting with ## inside an expression is now a syntax error.
> For instance, input
>
>     {
>     ##
>
> yields
>
>     /dev/stdin:2:1: Expected string or "}"
>
> Rather unfriendly error message, but we can fix that on top.
>
> > @@ -779,6 +917,41 @@ def check_exprs(exprs):
> >
> >      return exprs
> >
> > +def check_docs(docs):
> > +    for doc in docs:
> > +        expr_elem = doc.expr_elem
> > +        if not expr_elem:
> > +            continue
>
> A symbol block without an expr_elem should be an error, shouldn't it?
>
>
I don't think we want to make comments mandatory, for tests at least. I
have move the check to check_exprs() and added a 'strict_doc' argument only
set when generating doc.

> +
> > +        expr = expr_elem['expr']
> > +        for i in ('enum', 'union', 'alternate', 'struct', 'command',
> 'event'):
> > +            if i in expr:
> > +                meta = i
> > +                break
>
> We're working with expression trees here, not the QAPISchemaEntity
> objects.  Okay as long as the work is sufficiently trivial.
>
> > +
> > +        info = expr_elem['info']
> > +        name = expr[meta]
> > +        if doc.symbol != name:
> > +            raise QAPIExprError(info,
> > +                                "Documentation symbol mismatch '%s' !=
> '%s'"
> > +                                % (doc.symbol, name))
>
> Cryptic.  Suggest "Definition of '%s' follows documentation for '%s'" %
> (name, doc.symbol).
>

ok


>
> > +        if not 'command' in expr and doc.has_meta('Returns'):
> > +            raise QAPIExprError(info, "Invalid return documentation")
> > +
> > +        doc_args = set(doc.args.keys())
>
> doc_args is the set of documented argument names.  Actually member names
> when expr defines a type, but let's ignore that for now.
>
> > +        if meta == 'union':
> > +            data = expr.get('base', [])
> > +        else:
> > +            data = expr.get('data', [])
> > +        if isinstance(data, dict):
> > +            data = data.keys()
> > +        args = set([k.strip('*') for k in data])
>
> Suggest to rename k to name.
>
> ok


> To see what args is, we first need to figure out what data is.
>
> If expr defines an enum, expr['data'] is the list of of member names.
> args is the set of member names.
>
> If expr defines a struct type, expr['data'] is the OrderedDict mapping
> local member names with optional '*' prefix to types.  args is the set
> of local member names.  "Local", because additional members may be
> inherited from a base type.
>
> If expr defines an alternate type, likewise, except there should be no
> '*' prefixes and no base type.
>
> If expr defines a simple union type, expr['data'] is the OrderedDict
> mapping tag value names to types.  The tag member is implicit.
> expr['base'] should not exist.  args is therefore the empty set.
>
> If expr defines a flat union type, data is the OrderedDict mapping tag
> value names to types.  expr['base'] must exist, and is either a
> dictionary or the name of a struct type.  If it's a dictionary, args is
> the set of common members defined inline.  If it's a struct type name,
> args is the set of characters in the type name.  Oops.  Test case: union
> UserDefFlatUnion in qapi-schema-test.json.
>
> Aside: covering all cases correctly is not trivial enough for expression
> trees, I'm afraid.  Class QAPISchema has to do it.  Can we avoid
> duplicating its logic here?  I'm not sure; the QAPISchemaEntity might
> turn out to be just differently inconvenient.
>
> If expr defines a command, data is either the OrderedDict mapping
> argument names to types, or the name of the arguments type.  If it's a
> dictionary, args is the set of argument names.  If it's a type name,
> args is the set of characters in the type name.  Oops again.  Test case:
> command user_def_cmd0 in qapi-schema-test.json.
>

Do those have real use cases in the schema? If not, could we ignore or not
support them?


>
> If expr defines an event, likewise.  No test case in
> qapi-schema-test.json.  Hole in the test coverage.
>
> Conclusion: args is meant to be the set of arguments / members actually
> defined.  For unions, that does *not* include variant members.
>

I understand there is a lot of missing cases, but since they don't seem to
be reached, can we leave this for a future improvement? Someone more
familiar with QAPI details could help me here :)


> > +        if meta == 'alternate' or \
> > +           (meta == 'union' and not expr.get('discriminator')):
> > +            args.add('type')
>
> For alternate and simple union types, add 'type' to the set of arguments
> actually defined.
>
> Recall that the set contains the member names for alternates, and
> nothing for simple unions.
>
> > +        if not doc_args.issubset(args):
>
> Could use the <= operator: if not doc_args <= args.  Matter of taste.
>
> > +            raise QAPIExprError(info, "Members documentation is not a
> subset of"
> > +                                " API %r > %r" % (list(doc_args),
> list(args)))
>
> Errors out if the doc string documents arguments that aren't actually
> defined as far as we know.
>
> "As far as we know", because we second-guess what's actually defined
> based on the expression tree, and don't get all the cases right.  The
> actual actual definitions are determined by class QAPISchema, and can be
> found in its internal representation.  Hmm.
>


I guess this can be later improved by using lookup_entity()?


> Failing to document an argument is not an error.  It probably should be,
> at least in the longer run.
>

Eventually, since we have total control of the documentation. But in
practice, arguments can be quite obvious and people tend to not document
them, I wouldn't be so strict.

>
> >  #
> >  # Schema compiler frontend
> > diff --git a/scripts/qapi2texi.py b/scripts/qapi2texi.py
> > new file mode 100755
> > index 0000000..7e2440c
> > --- /dev/null
> > +++ b/scripts/qapi2texi.py
>
> Considering the length of this review so far, I'm only skimming
> generator part for now.
>
>
thanks ;)


> > @@ -0,0 +1,316 @@
> > +#!/usr/bin/env python
> > +# QAPI texi generator
> > +#
> > +# This work is licensed under the terms of the GNU LGPL, version 2+.
> > +# See the COPYING file in the top-level directory.
> > +"""This script produces the documentation of a qapi schema in texinfo
> format"""
> > +import re
> > +import sys
> > +
> > +from qapi import *
> > +
> > +COMMAND_FMT = """
> > address@hidden {type} {{{ret}}} {name} @
> > +{{{args}}}
> > +
> > +{body}
> > +
> > address@hidden deftypefn
> > +
> > +""".format
> > +
> > +ENUM_FMT = """
> > address@hidden Enum {name}
> > +
> > +{body}
> > +
> > address@hidden deftp
> > +
> > +""".format
> > +
> > +STRUCT_FMT = """
> > address@hidden {type} {name} @
> > +{{{attrs}}}
> > +
> > +{body}
> > +
> > address@hidden deftp
> > +
> > +""".format
> > +
> > +EXAMPLE_FMT = """@example
> > +{code}
> > address@hidden example
> > +""".format
> > +
> > +
> > +def subst_strong(doc):
> > +    """Replaces *foo* by @strong{foo}"""
> > +    return re.sub(r'\*([^_\n]+)\*', r'@emph{\1}', doc)
> > +
> > +
> > +def subst_emph(doc):
> > +    """Replaces _foo_ by @emph{foo}"""
> > +    return re.sub(r'\s_([^_\n]+)_\s', r' @emph{\1} ', doc)
> > +
> > +
> > +def subst_vars(doc):
> > +    """Replaces @var by @var{var}"""
> > +    return re.sub(r'@([\w-]+)', r'@var{\1}', doc)
> > +
> > +
> > +def subst_braces(doc):
> > +    """Replaces {} with @{ @}"""
> > +    return doc.replace("{", "@{").replace("}", "@}")
> > +
> > +
> > +def texi_example(doc):
> > +    """Format @example"""
> > +    doc = subst_braces(doc).strip('\n')
> > +    return EXAMPLE_FMT(code=doc)
> > +
> > +
> > +def texi_comment(doc):
> > +    """
> > +    Format a comment
> > +
> > +    Lines starting with:
> > +    - |: generates an @example
> > +    - =: generates @section
> > +    - ==: generates @subsection
> > +    - 1. or 1): generates an @enumerate @item
> > +    - o/*/-: generates an @itemize list
> > +    """
> > +    lines = []
> > +    doc = subst_braces(doc)
> > +    doc = subst_vars(doc)
> > +    doc = subst_emph(doc)
> > +    doc = subst_strong(doc)
> > +    inlist = ""
> > +    lastempty = False
> > +    for line in doc.split('\n'):
> > +        empty = line == ""
> > +
> > +        if line.startswith("| "):
> > +            line = EXAMPLE_FMT(code=line[1:])
> > +        elif line.startswith("= "):
> > +            line = "@section " + line[1:]
> > +        elif line.startswith("== "):
> > +            line = "@subsection " + line[2:]
> > +        elif re.match("^([0-9]*[.)]) ", line):
> > +            if not inlist:
> > +                lines.append("@enumerate")
> > +                inlist = "enumerate"
> > +            line = line[line.find(" ")+1:]
> > +            lines.append("@item")
> > +        elif re.match("^[o*-] ", line):
> > +            if not inlist:
> > +                lines.append("@itemize %s" % {'o': "@bullet",
> > +                                              '*': "@minus",
> > +                                              '-': ""}[line[0]])
> > +                inlist = "itemize"
> > +            lines.append("@item")
> > +            line = line[2:]
> > +        elif lastempty and inlist:
> > +            lines.append("@end %s\n" % inlist)
> > +            inlist = ""
> > +
> > +        lastempty = empty
> > +        lines.append(line)
> > +
> > +    if inlist:
> > +        lines.append("@end %s\n" % inlist)
> > +    return "\n".join(lines)
> > +
> > +
> > +def texi_args(expr):
> > +    """
> > +    Format the functions/structure/events.. arguments/members
> > +    """
> > +    data = expr["data"] if "data" in expr else {}
> > +    if isinstance(data, str):
> > +        args = data
> > +    else:
> > +        arg_list = []
> > +        for name, typ in data.iteritems():
> > +            # optional arg
> > +            if name.startswith("*"):
> > +                name = name[1:]
> > +                arg_list.append("['%s': @var{%s}]" % (name, typ))
> > +            # regular arg
> > +            else:
> > +                arg_list.append("'%s': @var{%s}" % (name, typ))
> > +        args = ", ".join(arg_list)
> > +    return args
> > +
> > +def section_order(section):
> > +    return {"Returns": 0,
> > +            "Note": 1,
> > +            "Notes": 1,
> > +            "Since": 2,
> > +            "Example": 3,
> > +            "Examples": 3}[section]
> > +
> > +def texi_body(doc, arg="@var"):
> > +    """
> > +    Format the body of a symbol documentation:
> > +    - a table of arguments
> > +    - followed by "Returns/Notes/Since/Example" sections
> > +    """
> > +    body = "@table %s\n" % arg
> > +    for arg, desc in doc.args.iteritems():
> > +        if desc.startswith("#optional"):
> > +            desc = desc[10:]
> > +            arg += "*"
> > +        elif desc.endswith("#optional"):
> > +            desc = desc[:-10]
> > +            arg += "*"
> > +        body += "@item %s\n%s\n" % (arg, texi_comment(desc))
> > +    body += "@end table\n"
> > +    body += texi_comment(doc.get_body())
> > +
> > +    meta = sorted(doc.meta, key=lambda i: section_order(i[0]))
> > +    for m in meta:
> > +        key, doc = m
> > +        func = texi_comment
> > +        if key.startswith("Example"):
> > +            func = texi_example
> > +
> > +        body += "address@hidden address@hidden quotation" % \
> > +                (key, func(doc))
> > +    return body
> > +
> > +
> > +def texi_alternate(expr, doc):
> > +    """
> > +    Format an alternate to texi
> > +    """
> > +    args = texi_args(expr)
> > +    body = texi_body(doc)
> > +    return STRUCT_FMT(type="Alternate",
> > +                      name=doc.symbol,
> > +                      attrs="[ " + args + " ]",
> > +                      body=body)
> > +
> > +
> > +def texi_union(expr, doc):
> > +    """
> > +    Format an union to texi
> > +    """
> > +    args = texi_args(expr)
> > +    body = texi_body(doc)
> > +    return STRUCT_FMT(type="Union",
> > +                      name=doc.symbol,
> > +                      attrs="[ " + args + " ]",
> > +                      body=body)
> > +
> > +
> > +def texi_enum(_, doc):
> > +    """
> > +    Format an enum to texi
> > +    """
> > +    body = texi_body(doc, "@samp")
> > +    return ENUM_FMT(name=doc.symbol,
> > +                    body=body)
> > +
> > +
> > +def texi_struct(expr, doc):
> > +    """
> > +    Format a struct to texi
> > +    """
> > +    args = texi_args(expr)
> > +    body = texi_body(doc)
> > +    return STRUCT_FMT(type="Struct",
> > +                      name=doc.symbol,
> > +                      attrs="@{ " + args + " @}",
> > +                      body=body)
> > +
> > +
> > +def texi_command(expr, doc):
> > +    """
> > +    Format a command to texi
> > +    """
> > +    args = texi_args(expr)
> > +    ret = expr["returns"] if "returns" in expr else ""
> > +    body = texi_body(doc)
> > +    return COMMAND_FMT(type="Command",
> > +                       name=doc.symbol,
> > +                       ret=ret,
> > +                       args="(" + args + ")",
> > +                       body=body)
> > +
> > +
> > +def texi_event(expr, doc):
> > +    """
> > +    Format an event to texi
> > +    """
> > +    args = texi_args(expr)
> > +    body = texi_body(doc)
> > +    return COMMAND_FMT(type="Event",
> > +                       name=doc.symbol,
> > +                       ret="",
> > +                       args="(" + args + ")",
> > +                       body=body)
> > +
> > +
> > +def texi(docs):
> > +    """
> > +    Convert QAPI schema expressions to texi documentation
> > +    """
> > +    res = []
> > +    for doc in docs:
> > +        try:
> > +            expr_elem = doc.expr_elem
> > +            if expr_elem is None:
> > +                res.append(texi_body(doc))
> > +                continue
> > +
> > +            expr = expr_elem['expr']
> > +            (kind, _) = expr.items()[0]
> > +
> > +            fmt = {"command": texi_command,
> > +                   "struct": texi_struct,
> > +                   "enum": texi_enum,
> > +                   "union": texi_union,
> > +                   "alternate": texi_alternate,
> > +                   "event": texi_event}
> > +            try:
> > +                fmt = fmt[kind]
> > +            except KeyError:
> > +                raise ValueError("Unknown expression kind '%s'" % kind)
> > +            res.append(fmt(expr, doc))
> > +        except:
> > +            print >>sys.stderr, "error at @%s" % qapi
> > +            raise
>
> What errors do you expect here?  I'm asking because catching exceptions
> indiscrimatingly feels problematic.
>

yeah, I think it's mostly KeyError or ValueError, I can restrict it, but at
the same time, since we re-raise, I don't think it's problematic, it's a
top-level function.


> > +
> > +    return '\n'.join(res)
> > +
> > +
> > +def parse_schema(fname):
> > +    """
> > +    Parse the given schema file and return the exprs
> > +    """
> > +    try:
> > +        schema = QAPISchemaParser(open(fname, "r"))
> > +        check_exprs(schema.exprs)
> > +        check_docs(schema.docs)
> > +        return schema.docs
>
> You don't actually "return the exprs", you return a list of QAPIDoc.
>
>
Yeah, last iteration change,


> Either return schema and let the caller retrieve its .docs, or rename
> the function to parse_docs().  In either case, fix up the doc string.
>
> > +    except (QAPISchemaError, QAPIExprError), err:
> > +        print >>sys.stderr, err
> > +        exit(1)
>
> The function duplicates parts of QAPISchema.__init__().  The parts it
> doesn't duplicate would be useful here, because without them, you might
> be working on a broken schema.
>
> Recommend to drop the function, and ...
>
> > +
> > +
> > +def main(argv):
> > +    """
> > +    Takes schema argument, prints result to stdout
> > +    """
> > +    if len(argv) != 2:
> > +        print >>sys.stderr, "%s: need exactly 1 argument: SCHEMA" %
> argv[0]
> > +        sys.exit(1)
> > +
> > +    docs = parse_schema(argv[1])
>
> ... instead do
>
>        schema = QAPISchema(argv[1])
>        docs = schema.docs
>

done


>
> with the obvious patch to make docs an attribute of schema:
>
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index ed52ee4..b17e7de 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -1424,7 +1424,9 @@ class QAPISchemaEvent(QAPISchemaEntity):
>  class QAPISchema(object):
>      def __init__(self, fname):
>          try:
> -            self.exprs = check_exprs(QAPISchemaParser(open(fname,
> "r")).exprs)
> +            parser = QAPISchemaParser(open(fname, "r"))
> +            self.exprs = check_exprs(parser.exprs)
> +            self.docs = parser.docs
>              self._entity_dict = {}
>              self._predefining = True
>              self._def_predefineds()
>
>
ok


> You don't use the common parse_command_line().  Probably the right
> choice right now, as reusing it would require modifications (we have no
> use for -c and -h at least), and wouldn't buy us much.
>
> > +    print texi(docs)
> > +
> > +
> > +if __name__ == "__main__":
> > +    main(sys.argv)
>
> Ignorant question: why is this __name__ == "__main__" raindance better
> than the stupid way the qapi-{commands,event,introspect,types,visit}.py
> work?
>

I have always used __name__ == "__main__", see:

http://stackoverflow.com/questions/419163/what-does-if-name-main-do


>
> > diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> > index 2841c51..d82e251 100644
> > --- a/docs/qapi-code-gen.txt
> > +++ b/docs/qapi-code-gen.txt
> > @@ -45,16 +45,13 @@ QAPI parser does not).  At present, there is no
> place where a QAPI
> >  schema requires the use of JSON numbers or null.
> >
> >  Comments are allowed; anything between an unquoted # and the following
> > -newline is ignored.  Although there is not yet a documentation
> > -generator, a form of stylized comments has developed for consistently
> > -documenting details about an expression and when it was added to the
> > -schema.  The documentation is delimited between two lines of ##, then
> > -the first line names the expression, an optional overview is provided,
> > -then individual documentation about each member of 'data' is provided,
> > -and finally, a 'Since: x.y.z' tag lists the release that introduced
> > -the expression.  Optional members are tagged with the phrase
> > -'#optional', often with their default value; and extensions added
> > -after the expression was first released are also given a '(since
> > +newline is ignored.  The documentation is delimited between two lines
> > +of ##, then the first line names the expression, an optional overview
> > +is provided, then individual documentation about each member of 'data'
> > +is provided, and finally, a 'Since: x.y.z' tag lists the release that
> > +introduced the expression.  Optional members are tagged with the
> > +phrase '#optional', often with their default value; and extensions
> > +added after the expression was first released are also given a '(since
> >  x.y.z)' comment.  For example:
> >
> >      ##
> > @@ -73,12 +70,39 @@ x.y.z)' comment.  For example:
> >      #           (Since 2.0)
> >      #
> >      # Since: 0.14.0
> > +    #
> > +    # Notes: You can also make a list:
> > +    #        - with items
> > +    #        - like this
> > +    #
> > +    # Example:
> > +    #
> > +    # -> { "execute": ... }
> > +    # <- { "return": ... }
>
> Err, 'BlockStats' is a *type*, not a command.
>
>
It could be returned/used in an example, it fits also this example.



> > +    #
> >      ##
> >      { 'struct': 'BlockStats',
> >        'data': {'*device': 'str', 'stats': 'BlockDeviceStats',
> >                 '*parent': 'BlockStats',
> >                 '*backing': 'BlockStats'} }
> >
> > +It's also possible to create documentation sections, such as:
> > +
> > +    ##
> > +    # = Section
> > +    # == Subsection
> > +    #
> > +    # Some text foo with *emphasis*
> > +    # 1. with a list
> > +    # 2. like that
> > +    #
> > +    # And some code:
> > +    # | $ echo foo
> > +    # | -> do this
> > +    # | <- get that
> > +    #
> > +    ##
> > +
>
> This suggests various kinds of markup, including section headers,
> emphasis, numbered lists.  But what exactly is recognized?
>
>
Ok updated


> >  The schema sets up a series of types, as well as commands and events
> >  that will use those types.  Forward references are allowed: the parser
> >  scans in two passes, where the first pass learns all type names, and
>
> As discussed in my review of the cover letter, we have some additional
> documentation work to do.
>
>
sure, but it's already 175 patches to review ;)


> Also missing: tests.  A first cut should have a negative test for each
> error qapi2texi.py can report, plus a modest suite of positive tests.  A
> negative test consists of tests/qapi-schema/NAME.{json,out,err,exit}.
> Positive tests can be added to tests/qapi-schema/qapi-schema-test.json,
> or to a separate test schema, in case that's more convenient.
>
> To get the most mileage out of positive tests, test-qapi.py should be
> extended to show the resulting QAPIDoc.  Sketch appended.
>
>
Thanks a lot for the feedback, I have done many changes based on it.

I'll work on adding some tests and send the new version, hopefully this
week.

If you have further comments based on this reply, I'll try to take them as
well with the next iteration.


> diff --git a/tests/qapi-schema/test-qapi.py
> b/tests/qapi-schema/test-qapi.py
> index ef74e2c..93cc709 100644
> --- a/tests/qapi-schema/test-qapi.py
> +++ b/tests/qapi-schema/test-qapi.py
> @@ -55,3 +55,13 @@ class QAPISchemaTestVisitor(QAPISchemaVisitor):
>
>  schema = QAPISchema(sys.argv[1])
>  schema.visit(QAPISchemaTestVisitor())
> +
> +for doc in schema.docs:
> +    expr_elem = doc.expr_elem
> +    expr_info = expr_elem and expr_elem['info']
> +    print 'doc %s %s' % (doc.symbol, expr_info)
> +    for arg, text in doc.args.iteritems():
> +        print '    arg=%s %s' % (arg, text)
> +    for key, text in doc.meta:
> +        print '    meta %s %s' % (key, text)
> +    print '    body=%s' % doc.body
>
> --
Marc-André Lureau
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [PATCH v3 07/14] qapi: use one symbol per line, (continued)
- [Qemu-devel] [PATCH v3 09/14] qapi: add some sections in docs, Marc-André Lureau, 2016/11/07
  - Re: [Qemu-devel] [PATCH v3 09/14] qapi: add some sections in docs, Markus Armbruster, 2016/11/07
- [Qemu-devel] [PATCH v3 10/14] docs: add master qapi texi files, Marc-André Lureau, 2016/11/07
  - Re: [Qemu-devel] [PATCH v3 10/14] docs: add master qapi texi files, Markus Armbruster, 2016/11/08
- [Qemu-devel] [PATCH v3 08/14] qapi: add missing colon-ending for section name, Marc-André Lureau, 2016/11/07
  - Re: [Qemu-devel] [PATCH v3 08/14] qapi: add missing colon-ending for section name, Markus Armbruster, 2016/11/07
- [Qemu-devel] [PATCH v3 12/14] texi2pod: learn quotation, deftp and deftypefn, Marc-André Lureau, 2016/11/07
- [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script, Marc-André Lureau, 2016/11/07
  - Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script, Markus Armbruster, 2016/11/10
    - Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script, Marc-André Lureau <=
- [Qemu-devel] [PATCH v3 14/14] build-sys: add qapi doc generation targets, Marc-André Lureau, 2016/11/07
- [Qemu-devel] [PATCH v3 13/14] qmp-commands: (SQUASHED) move doc to schema, Marc-André Lureau, 2016/11/07
Prev by Date: [Qemu-devel] [Bug 1640525] Re: -net socket, connect/listen does not work in 2.7.0
Next by Date: Re: [Qemu-devel] QEMU postcopy-test failing on ppc64
Previous by thread: Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script
Next by thread: [Qemu-devel] [PATCH v3 14/14] build-sys: add qapi doc generation targets
Index(es):
- Date
- Thread