[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] gawk printf misunderstand

From: Aharon Robbins
Subject: Re: [bug-gawk] gawk printf misunderstand
Date: Thu, 01 Nov 2012 22:24:34 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hello Nelson.

> Date: Thu, 1 Nov 2012 10:29:21 -0600 (MDT)
> From: "Nelson H. F. Beebe" <address@hidden>
> To: address@hidden
> Subject: Re: [bug-gawk] gawk printf misunderstand
> Yesterday, in response to a posting, Arnold reported on this list that
> the printf 0 flag handling in gawk had been changed to NOT provide
> zero fill with a format "%04s".
> In general, awk (and other scripting language) implementations of
> printf() have usually followed the native C implementations.  

That is because they simply pass through to the C library. This is
cause for many mistaken impressions and leads to inconsistencies where
the same awk program with the same interpreter (bwk awk, mawk) can
generate different behaviour on different systems.

Gawk instead, purposely, chooses to implement printf on its own (except
for the low level parts of the floating point formats) in order to
provide well defined and consistent behavior everywhere.

> Here is what Section of the ISO C99 Standard (technical
> change 3) has to say about that flag:
>     0         For d, i, o, u, x, X, a, A, e, E, f, F, g, and G 
>               conversions, leading zeros (following any indication of
>               sign or base) are used to pad to the field width rather
>               than performing space padding, except when converting an
>               infinity or NaN. If the 0 and - flags both appear, the 0
>               flag is ignored. For d, i, o, u, x, and X conversions,
>               if a precision is specified, the 0 flag is ignored. For
>               other conversions, the behavior is undefined.
> Notice in particular the last sentence.
> The 1989 and 2011 ISO C Standards say essentially the same thing.
> Because leading zero filling ensures that a numeric output field
> remains numeric, it seems most sensible to ignore it for nonnumeric
> output fields, which should then retain leading space filling instead.

Yep.  The point is that some standard somewhere (possibly POSIX) at one
point said that the 0 flag applies to %s.  That was later changed.
I remember long ago making 0 apply to %c and %s on purpose because some
standard mandated it. I don't care to research what it was.  When it
was pointed out that it isn't mandated, I changed it back.

> [...]
> It MIGHT be worth changing the gawk user manual to note that gawk's
> handling of printf() format items is intended to follow the behavior
> mandated by the ISO C Standards, and when those standards say
> "undefined behavior", to then fall back to the behavior of the glibc
> implementations.  That would seem to be better than to have gawk try
> to supply its own implementation choices for ~undefined behavior".

Not quite.

First, the relevant standard is this:


But it says largely the same thing:

        For d, i, o, u, x, X, e, E, f, g, and G conversion specifiers,
        leading zeros (following any indication of sign or base)
        shall be used to pad to the field width; no space padding is
        performed. If the '0' and '-' flags both appear, the '0' flag
        shall be ignored. For d, i, o, u, x, and X conversion specifiers,
        if a precision is specified, the '0' flag shall be ignored. For
        other conversion specifiers, the behavior is undefined.

The gawk change log says this:

Tue Aug  4 06:04:23 2009  Arnold D. Robbins  <address@hidden>

        * builtin.c (format_tree): zero_flag does not apply to
        %c and %s conversions. Thanks to Mike Brennan and Thomas Dickey
        for the bug report.

So the change was made well over three years ago.  Finally, the doc
says this:

@item 0
A leading @samp{0} (zero) acts as a flag that indicates that output should be
padded with zeros instead of spaces.
This applies only to the numeric output formats.
This flag only has an effect when the field width is wider than the
value to print.

I don't particularly wish to tie gawk to what glibc does, since it's not
relying on glibc, and glibc isn't always the best example.

Overall, I think things are OK as they are, and the correct thing
is to just avoid the 0 flag with %c and %s.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]