bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %nterm directive incorrectly accepts character literals and quoted s


From: Akim Demaille
Subject: Re: %nterm directive incorrectly accepts character literals and quoted strings
Date: Sun, 25 Nov 2018 10:51:51 +0100

Hi Rici,

I’m back on the issues you raised about %nterm, %type and the like.

> Le 15 oct. 2018 à 18:02, Rici Lake <address@hidden> a écrit :
> 
> (Note that the Posix grammar allows %type declarations without a tag.)

Well, what POSIX wants is not so obvious to me.

http://pubs.opengroup.org/onlinepubs/009604599/utilities/yacc.html

reads:

> The following declares that union member names are non-terminals, and thus it 
> is required to have a tag field at its beginning:
> 
> %type <tag> name
> ...

which clearly states that there must be a tag.  However, I agree
that the grammar they provide accepts that there are no tags:

> def   : rword tag nlist
>       ;
> rword : TOKEN
>       | LEFT
>       | RIGHT
>       | NONASSOC
>       | TYPE
>       ;
> tag   : /* Empty: union tag ID optional. */
>       | '<' IDENTIFIER '>'
>       ;
> nlist : nmno
>       | nlist nmno
>       ;
> nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
>       | IDENTIFIER NUMBER  /* Note: invalid with % type. */
>       ;

Unfortunately, AFAICT, there is no reference implementation of
POSIX we could use to decide what to do.  Unless we consider
that Plan9’s implementation of Yacc (which is really the direct
heir from Unix’s) is that reference.  It’s available here:

https://github.com/brho/plan9/blob/89d43d2262ad43eb4b26c2a8d6a27cfeddb33828/sys/src/cmd/yacc.c

In this implementation, %type is mapped to TYPEDEF 
(https://github.com/brho/plan9/blob/89d43d2262ad43eb4b26c2a8d6a27cfeddb33828/sys/src/cmd/yacc.c#L326):

> struct
> {
>       char*   name;
>       long    value;
> } resrv[] =
> {
>       "binary",       BINARY,
>       "left",         LEFT,
>       "nonassoc",     BINARY,
>       "prec",         PREC,
>       "right",        RIGHT,
>       "start",        START,
>       "term",         TERM,
>       "token",        TERM,
>       "type",         TYPEDEF,
>       "union",        UNION,
>       0,
> };

(amusingly, %term is accepted as an alias for %token, but there is no %nterm).

then the parser for %type _requires_ that tag 
(https://github.com/brho/plan9/blob/89d43d2262ad43eb4b26c2a8d6a27cfeddb33828/sys/src/cmd/yacc.c#L1267):


>       case TYPEDEF:
>               if(gettok() != TYPENAME)
>                       error("bad syntax in %%type");

There’s no doubt that TYPENAME is <tag> 
(https://github.com/brho/plan9/blob/89d43d2262ad43eb4b26c2a8d6a27cfeddb33828/sys/src/cmd/yacc.c#L1692):

>       case '<':
>               /* get, and look up, a type name (union member name) */
>               i = 0;
>               while((c=Bgetrune(finput)) != '>' && c >= 0 && c != '\n') {
>                       rune = c;
>                       c = runetochar(&tokname[i], &rune);
>                       if(i < NAMESIZE)
>                               i += c;
>               }
>               if(c != '>')
>                       error("unterminated < ... > clause");
>               tokname[i] = 0;
>               for(i=1; i<=ntypes; i++)
>                       if(!strcmp(typeset[i], tokname)) {
>                               numbval = i;
>                               return TYPENAME;
>                       }
>               ntypes++;
>               numbval = ntypes;
>               typeset[numbval] = cstash(tokname);
>               return TYPENAME;


So now, it’s less obvious to me what POSIX expects.  Well, actually
the ambiguity might follow from a common patterns in parsers:
have a simple grammar, and reject what you don't like elsewhere
(actions with YYERROR, type checker, etc.).  So I tend to think
that POSIX does want to mean that the <tag> is mandatory.


I think that a key feature of %nterm was to enable one to clearly
communicate that « this is a non terminal ».  Unfortunately, in its
current form in Bison « %type » only can mean « this is a typed
non terminal ».

So I think we should generalize %type to be like %nterm was, i.e.:
- accept that there are no tags
- accept that there are several

both of which being extensions to POSIX.

WDYT?

You wrote:

> The fact that bison's %token and %nterm declarations do allow multiple tags
> is probably the only justification for the existence of %nterm, since there
> is no need to predeclare non-terminals. (It could be considered superior to
> %type because it explicitly states that the targets are non-terminals. On
> the other hand, it is generally more useful IMHO to group terminals and
> non-terminals with the same type tag together.)

The last sentence is unclear to me.  It seems to mean that %type
can be used for tokens, but POSIX is clearly stating that it is
for non terminal only:

> The following declares that union member names are non-terminals, and thus it 
> is required to have a tag field at its beginning:
> 
> %type <tag> name
> ...

even if, again, the grammar is more relaxed.


FTR, all the occurrences I saw out there about %nterm are
copies of Bison itself:

https://github.com/search?p=1&q=%22%25nterm%22&type=Code


reply via email to

[Prev in Thread] Current Thread [Next in Thread]