bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in latest awk release


From: Aharon Robbins
Subject: Re: bug in latest awk release
Date: Thu, 15 Sep 2005 23:28:01 +0300

Paul,

It would help if you would check your facts before you post garbage
like this.

*I* am the one responsible for the POSIX text you quoted being there
in the first place.  It was intended, since it makes the behavior
considerably more consistent and logical.

Gawk is compliant with the POSIX spec in this regard, and Stepan's
explanation of what's going on is correct. Older versions of Unix awk
are not POSIX compliant; they would split $0 with whatever value of FS
was current at the time a field was requested.  Many people came to rely
on this behavior and are surpised when gawk doesn't do it.

This is all very clearly documented in the gawk manual and has been
for YEARS.

The correct thing to do is to set FS to "\t" in a BEGIN block and
then all versions of awk will work correctly.

In the meantime, I recommend that anyone wishing to learn how
awk works *read* the gawk documentation instead of relying on
the drivel in comp.lang.awk or on many of the replies I see in
bug-gnu-utils.

Arnold

> Date: Thu, 15 Sep 2005 12:06:59 -0700
> From: Paul Eggert <address@hidden>
> Subject: Re: bug in latest awk release
> To: Mirco Meniconi <address@hidden>
> Cc: address@hidden
>
> Stepan Kasal <address@hidden> writes:
>
> > So in your example:
> >     FS="\t" {print $1}
> > the "condition" is the assignment to FS.  It is always true, so all
> > lines are printed.  But the FS is changes only when the condition is
> > evaluated.  This means that the first line is split according to
> > the default FS.
>
> Unfortunately that is a minor POSIX-conformance bug in gawk.
>
> The POSIX spec for awk
> <http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html>
> under EXTENDED DESCRIPTION states:
>
>    Before the first reference to a field in the record is evaluated,
>    the record shall be split into fields, according to the rules in
>    Regular Expressions, using the value of FS that was current at the
>    time the record was read.  Each pattern in the program then shall
>    be evaluated in the order of occurrence,...
>
> Therefore, if the pattern changes FS, the input record must still be
> split according to the previous FS.  Since Gawk doesn't do this, it
> doesn't conform to POSIX here.
>
> I doubt whether this departure from Unix practice and from POSIX was
> intended, so it's just a minor gawk bug.
>
> Incidentally, I tested this example with Solaris 10 /bin/awk, Solaris
> 10 /bin/nawk, Solaris 10 /usr/xpg4/bin/awk, and Gawk 3.1.5.  Only
> /bin/nawk conformed to POSIX.  So, even though the script conforms to
> POSIX, I wouldn't use constructs like that in awk scripts that are
> intended to be portable.
>
>
> #####################################################################################
> This Mail Was Scanned by 012.net Anti Virus Service - Powered by TrendMicro 
> Interscan
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]