bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bugs in printf/sprintf formatted output


From: Maciej W. Rozycki
Subject: Bugs in printf/sprintf formatted output
Date: Fri, 7 Jun 2024 11:55:12 +0100 (BST)

Hi,

 Please let me know if you need this bug report sent differently.

 I have been following the guidelines from the top-level README file and 
the gawk(1) manual page.  I believe the bugs are generic, however for the 
record they have been observed with gawk 4.2.1 on POWER9/Linux, gawk 5.1.0 
on RISC-V/Linux and gawk 4.1.4 on x86-64/Linux systems (as distributed) 
and then the upstream master and a choice of earlier checkouts of gawk 
built with GCC 14.0.1 on POWER9/Linux (in particular while bisecting a 
problematic commit; see a note below).

 I have been recently working on improving test coverage for formatted 
output verification in glibc and due to the humongous amount of data 
processed, at least an order of magnitude larger than the whole glibc 
repository takes, rather than the usual approach to have reference test 
data pregenerated in the repository I chose to generate it on the fly 
using gawk as an independent implementation, in particular in the bignum 
mode (an exception is present I have been aware of for floating-point 
input; see a note below).  The resulting glibc test improvements will be 
submitted upstream soon.

 In the course of writing the test cases I have checked various released 
versions of gawk as well as upstream master and have come across numerous 
corner cases that gawk does not handle correctly (which for the record I 
have worked around by explicit handling).  Some apply to versions of up to 
4.2.1 only (see a note below), but I have listed them for completeness as 
that might be useful in the assessment.

 The issues with reproducers are in particular:

- extraneous leading 0 produced for the alternative form with the o 
  conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than "01",

- unexpected 0 produced where no characters are expected for the input of 
  0 and the alternative form with the precision of 0 and the integer 
  hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather 
  than "",

- missing + character in the non-bignum mode only for the input of 0 with 
  the + flag, precision of 0 and the signed integer conversions, e.g.
  { printf "%+.i", 0 } produces "" rather than "+",

- missing space character in the non-bignum mode only for the input of 0 
  with the space flag, precision of 0 and the signed integer conversions, 
  e.g. { printf "% .i", 0 } produces "" rather than " ",

- for released gawk versions of up to 4.2.1 missing - character for the 
  input of -NaN with the floating-point conversions, e.g. { printf "%e", 
  "-nan" }' produces "nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for the 
  input of -NaN with the floating-point conversions, e.g. { printf "%e", 
  "-nan" }' produces "+nan" rather than "-nan",

- for released gawk versions from 5.0.0 onwards + character output for the 
  input of Inf or NaN in the absence of the + or space flags with the 
  floating-point conversions, e.g. { printf "%e", "inf" }' produces "+inf" 
  rather than "inf",

- for released gawk versions of up to 4.2.1 missing + character for the
  input of Inf or NaN with the + flag and the floating-point conversions, 
  e.g. { printf "%+e", "inf" }' produces "inf" rather than "+inf",

- for released gawk versions of up to 4.2.1 missing space character for
  the input of Inf or NaN with the space flag and the floating-point 
  conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than 
  " nan",

- for released gawk versions from 5.0.0 onwards + character output for the 
  input of Inf or NaN with the space flag and the floating-point 
  conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than 
  " inf",

- for released gawk versions from 5.0.0 onwards the field width is ignored 
  for the input of Inf or NaN and the floating-point conversions, e.g.
  { printf "%20e", "-inf" }' produces "-inf" rather than
  "                -inf",

 NB for released gawk versions of up to 4.2.1 floating-point conversion 
issues apply to the bignum mode only, as in the non-bignum mode system 
sprintf(3) is used.  As from version 5.0.0 specialized handling has been 
added for [-]Inf and [-]NaN inputs with commit 8dba5f4c9002 ("Output +inf, 
+nan etc. also, so that output can be input. Doc, tests, fixed.") and the 
issues listed apply to both modes.  All the unmarked issues as well as 
ones marked as present from 5.0.0 onwards are also there in the upstream 
master.

 The `--posix' flag makes gawk versions from 5.0.0 onwards avoid the issue 
with field width and the + character unconditionally output for the input 
of Inf or NaN, however not the remaining issues.  I realise there are some 
limitations in Inf/NaN handling coming from gawk's legacy, so for example 
the space flag may or may not be reasonably supported in the non-POSIX 
mode, however I think the field width ought to be always respected, as it 
will often be used to format tables, etc. and it's a regression from 4.2.1 
too.

 FAOD I have used 
<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html> as 
the normative reference.

 Please let me know if have any questions or comments or need any further 
information.  I'll be happy to verify any potential fixes before you have 
pushed them to the upstream master.

  Maciej




reply via email to

[Prev in Thread] Current Thread [Next in Thread]