[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bugs in printf/sprintf formatted output
From: |
Maciej W. Rozycki |
Subject: |
Bugs in printf/sprintf formatted output |
Date: |
Fri, 7 Jun 2024 11:55:12 +0100 (BST) |
Hi,
Please let me know if you need this bug report sent differently.
I have been following the guidelines from the top-level README file and
the gawk(1) manual page. I believe the bugs are generic, however for the
record they have been observed with gawk 4.2.1 on POWER9/Linux, gawk 5.1.0
on RISC-V/Linux and gawk 4.1.4 on x86-64/Linux systems (as distributed)
and then the upstream master and a choice of earlier checkouts of gawk
built with GCC 14.0.1 on POWER9/Linux (in particular while bisecting a
problematic commit; see a note below).
I have been recently working on improving test coverage for formatted
output verification in glibc and due to the humongous amount of data
processed, at least an order of magnitude larger than the whole glibc
repository takes, rather than the usual approach to have reference test
data pregenerated in the repository I chose to generate it on the fly
using gawk as an independent implementation, in particular in the bignum
mode (an exception is present I have been aware of for floating-point
input; see a note below). The resulting glibc test improvements will be
submitted upstream soon.
In the course of writing the test cases I have checked various released
versions of gawk as well as upstream master and have come across numerous
corner cases that gawk does not handle correctly (which for the record I
have worked around by explicit handling). Some apply to versions of up to
4.2.1 only (see a note below), but I have listed them for completeness as
that might be useful in the assessment.
The issues with reproducers are in particular:
- extraneous leading 0 produced for the alternative form with the o
conversion, e.g. { printf "%#.2o", 1 } produces "001" rather than "01",
- unexpected 0 produced where no characters are expected for the input of
0 and the alternative form with the precision of 0 and the integer
hexadecimal conversions, e.g. { printf "%#.x", 0 } produces "0" rather
than "",
- missing + character in the non-bignum mode only for the input of 0 with
the + flag, precision of 0 and the signed integer conversions, e.g.
{ printf "%+.i", 0 } produces "" rather than "+",
- missing space character in the non-bignum mode only for the input of 0
with the space flag, precision of 0 and the signed integer conversions,
e.g. { printf "% .i", 0 } produces "" rather than " ",
- for released gawk versions of up to 4.2.1 missing - character for the
input of -NaN with the floating-point conversions, e.g. { printf "%e",
"-nan" }' produces "nan" rather than "-nan",
- for released gawk versions from 5.0.0 onwards + character output for the
input of -NaN with the floating-point conversions, e.g. { printf "%e",
"-nan" }' produces "+nan" rather than "-nan",
- for released gawk versions from 5.0.0 onwards + character output for the
input of Inf or NaN in the absence of the + or space flags with the
floating-point conversions, e.g. { printf "%e", "inf" }' produces "+inf"
rather than "inf",
- for released gawk versions of up to 4.2.1 missing + character for the
input of Inf or NaN with the + flag and the floating-point conversions,
e.g. { printf "%+e", "inf" }' produces "inf" rather than "+inf",
- for released gawk versions of up to 4.2.1 missing space character for
the input of Inf or NaN with the space flag and the floating-point
conversions, e.g. { printf "% e", "nan" }' produces "nan" rather than
" nan",
- for released gawk versions from 5.0.0 onwards + character output for the
input of Inf or NaN with the space flag and the floating-point
conversions, e.g. { printf "% e", "inf" }' produces "+inf" rather than
" inf",
- for released gawk versions from 5.0.0 onwards the field width is ignored
for the input of Inf or NaN and the floating-point conversions, e.g.
{ printf "%20e", "-inf" }' produces "-inf" rather than
" -inf",
NB for released gawk versions of up to 4.2.1 floating-point conversion
issues apply to the bignum mode only, as in the non-bignum mode system
sprintf(3) is used. As from version 5.0.0 specialized handling has been
added for [-]Inf and [-]NaN inputs with commit 8dba5f4c9002 ("Output +inf,
+nan etc. also, so that output can be input. Doc, tests, fixed.") and the
issues listed apply to both modes. All the unmarked issues as well as
ones marked as present from 5.0.0 onwards are also there in the upstream
master.
The `--posix' flag makes gawk versions from 5.0.0 onwards avoid the issue
with field width and the + character unconditionally output for the input
of Inf or NaN, however not the remaining issues. I realise there are some
limitations in Inf/NaN handling coming from gawk's legacy, so for example
the space flag may or may not be reasonably supported in the non-POSIX
mode, however I think the field width ought to be always respected, as it
will often be used to format tables, etc. and it's a regression from 4.2.1
too.
FAOD I have used
<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html> as
the normative reference.
Please let me know if have any questions or comments or need any further
information. I'll be happy to verify any potential fixes before you have
pushed them to the upstream master.
Maciej
- Bugs in printf/sprintf formatted output,
Maciej W. Rozycki <=