Hi Eric and Austin Group folks,
I apologize for the delay in replying. Real Life(tm) gets in the way
of these things.
I am cc'ing Brian Kernighan for his opinion on these issues as well.
Date: Thu, 03 Apr 2014 10:18:54 -0600
From: Eric Blake <address@hidden>
To: address@hidden
Cc: Austin Group <address@hidden>
Subject: [bug-gawk] use of ;; as terminator, request for grammar help
Hello GNU awk readers,
On today's Austin Group call (the people in charge of POSIX), we visited
http://austingroupbugs.net/view.php?id=226.
This is in regards to the POSIX awk specification at:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
Among other things, there were two action items pointed out that this
list might be able to help with:
1. GNU awk has a bug regarding ;; as a terminator. The POSIX grammar
allows for:
awk '{print};;{print}'
but gawk rejects this case. This was deemed to be a bug in gawk, since
POSIX was based on the nawk behavior at the time POSIX was standardized,
and nawk has always supported this.
I'm not convinced this is a real bug. In particular, accidents of the
Unix awk implementation should not necessarily be formally codified
in the standard. mawk, which was written based on the 1988 awk book,
also does not support this.
If there are awk programs that use this, they should best be changed to
have only one ';', in my humble opinion; there's no real added value
to codifying this into the language.
2. Based on existing implementations, there is consensus that the POSIX
grammar is overly restrictive, and that we should change it to permit:
awk '{print} {print}'
and:
awk '/foo/; {print}'
since existing implementations all support it. But to do that, we need
someone with help in writing grammars to propose the changes to the one
appearing on the POSIX page. Any input would be appreciated.
I disagree with the first desired change. The ground I'm standing on here is
firmer. The 1988 awk book disallowed rules without any separators, on the
grounds that rules and statements within them should be syntactically
consistent (a semicolon is required when multiple Xs [rules or statments] appear
on one line). And the very early released versions of nawk in fact enforced
this rule. (I remember testing against it.)
Later on, after the awk book, Brian changed his awk. If you look at his FIXES
file, you will see:
Nov 27, 1988:
With fear and trembling, modified the grammar to permit
multiple pattern-action statements on one line without
an explicit separator. By definition, this capitulation
to the ghost of ancient implementations remains undefined
and thus subject to change without notice or apology.
DO NOT COUNT ON IT.
The sentiment here is quite clear - while it might work, it should
not be formalized.
The gawk documentation follows this example, documenting clearly that
a semicolon is required between multiple rules on one line, and NOT
documenting that it can be left off. I do not plan to change this, either.
The second change (awk '/foo/; { print }') should be supported by the POSIX
grammar, since that is clearly two different rules.
As an aside, there are one or two other areas where gawk implements
undocumented (= unspecified) behavior for compatibility with Unix awk,
but those remain purposely undocumented in the gawk manual; the case
I'm thinking about even has this comment in the code:
/*
* A simple_stmt exists to satisfy a constraint in the POSIX
* grammar allowing them to occur as the 1st and 3rd parts
* in a `for (...;...;...)' loop. This is a historical oddity
* inherited from Unix awk, not at all documented in the AK&W
* awk book. We support it, as this was reported as a bug.
* We don't bother to document it though. So there.
*/
In my humble opinion, the ';;' issue is so trivial that it's not even worth
the effort I put in for simple statements in for loops.
I hope all this helps. Further discussion is welcome.
Arnold