[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Unstated differences between gawk and POSIX

From: Ed Morton
Subject: Re: [bug-gawk] Unstated differences between gawk and POSIX
Date: Tue, 7 Aug 2018 11:18:22 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

So, per the current trouble ticket (http://austingroupbugs.net/view.php?id=1198) it looks like the Austin Group will change the comparison type of numeric string vs numeric string from String to Numeric in the POSIX spec to match what gawk does, so all good there.

It looks like they will NOT, however, change their expected behavior when comparing an empty field to zero. Today gawk treats an empty or absent field as a null string, not as a zero-or-null numeric-string:
$ echo 'a,,c' | gawk -F, '{print typeof($2), $2, ($2==0 ? "==" : "!="), typeof(0), 0}'
string  != number 0

$ echo 'a,,c' | gawk -F, '{print typeof($8), $8, ($8==0 ? "==" : "!="), typeof(0), 0}'
unassigned  != number 0
while POSIX requires the behavior that some other awks (e.g. all awks on Solaris apparently) exhibit which is to treat an empty or absent field as it would an uninitialized variable:
$ echo 'a,,c' | gawk -F, '{print typeof(foo), foo, (foo==0 ? "==" : "!="), typeof(0), 0}'
untyped  == number 0

$ echo 'a,,c' | /usr/xpg4/bin/awk -F, '{print ($2==0 ? "==" : "!=")}'
I understand why gawk behaves as it does and I think that provides more intuitive results (especially for the mid-record empty field case), but it might be a good idea for one of you to comment on the Austin Group ticket (http://austingroupbugs.net/view.php?id=1198) to persuade them to define the POSIX behavior to be the way gawk currently behaves, otherwise gawk would have to behave differently when invoked with --posix to really be POSIX-compliant and that difference should be documented in the gawk manual.


On 8/7/2018 8:25 AM, Ed Morton wrote:
FYI we now have related tickets at the Open Group (https://help.opengroup.org/hc/en-us/requests/193457) and the Austin Group (http://austingroupbugs.net/view.php?id=1198). The Open Group one only staff can see, the Austin Group one is visible to anyone but requires a login to comment on.


On 8/5/2018 7:05 AM, Ed Morton wrote:
Arnold - Thanks for getting back to me. I don’t think anyone’s getting excited about this in the slightest and I’ll google how to file an interpretation request with the Open Group, thanks for the suggestion.

Ed Morton

On Aug 5, 2018, at 2:53 AM, address@hidden wrote:

Hi Ed.

I saw your earlier note also but have not had time to read the comp.lang.awk
thred in detail.

Ed Morton <address@hidden> wrote:

OK, so apparently gawk really doesn't behave per the POSIX standard when
comparing numeric-string to numeric-string:

In the Expressions In Awk
section POSIX says:

    Comparisons (with the '<', "<=", "!=", "==", '>', and ">=" operators) shall
    be made numerically if both operands are numeric, if one is numeric and the
    other has a string value that is a numeric string, or if one is numeric and
    the other has the uninitialized value. Otherwise, operands shall be
    converted to strings as required and a string comparison shall be made
The text in POSIX is bogus. The intent and prior art are that as soon as one
operand is a numeric string then a numeric comparison is done. Otherwise
something as basic as

    echo 5.0 10.0 | awk '{ print ($1 < $2) }'

would print 0.

You might want to file an interpretation request with the Open Group.

I see no reason to:

- get excited
- change gawk's behavior
- issue any warnings
- or update any documentation (except maybe POSIX.STD)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]