bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

printf incompatibilities with POSIX, ksh93


From: Paul Eggert
Subject: printf incompatibilities with POSIX, ksh93
Date: Fri, 26 Sep 2003 13:30:32 -0700
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.2 (gnu/linux)

Configuration Information [Automatically generated, do not change]:
Machine: sparc
OS: solaris2.8
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='sparc' 
-DCONF_OSTYPE='solaris2.8' -DCONF_MACHTYPE='sparc-sun-solaris2.8' 
-DCONF_VENDOR='sun' -DSHELL  -DHAVE_CONFIG_H  -I.  -I. -I./include -I./lib  -g 
-O2 -Wall -W -Wno-sign-compare -Wpointer-arith -Wstrict-prototypes 
-Wmissing-prototypes -Wmissing-noreturn -Wmissing-format-attribute
uname output: SunOS sic.twinsun.com 5.8 Generic_108528-23 sun4u sparc 
SUNW,UltraSPARC-IIi-Engine
Machine Type: sparc-sun-solaris2.8

Bash Version: 2.05b
Patch Level: 0
Release Status: release

Description:
Bash's printf command has some incompatibilities with POSIX
1003.1-2001 and with ksh93.  Here is the POSIX incompatibility:

$ printf '(\0007)'
(^G)

(Here, "^G" represents the character with octal code 7.)
This output doesn't conform to POSIX 1003.1-2001.
The output should be "(^@7)", where "^@" represents a null byte.

The remaining issues are not POSIX-conformance issues, but they are
incompatibilities with ksh93 and with what Standard C programmers
would expect.  Here's the first one:

$ printf '(\x07e)'
(^Ge)

The C standard says that hexadecimal escapes can have any positive
number of digits, and ksh93 agrees with this, so it outputs "(~)" for
this example.

Here's the second ksh93 incompatibility:

$ printf '(\"\?)'
(\"\?)

Again, the C standard says that \" and \? are escapes for " and ?, and
ksh93 outputs "("?)" for this example.

Repeat-By:
printf '(\0007)'
printf '(\x07e)'
printf '(\"\?)'
(Please see "Description" section for discussion.)

Fix:
Here is a proposed patch, which implements these changes:

* At most three octal digits are allowed in printf string octal escapes,
  for compatibility with POSIX.  Previously, Bash allowed four digits
  if the first one was '0'.)
  
* printf string hexadecimal escapes can now contain any positive
  number of digits, for compatibility with the C standard and with
  ksh93.  Previously, Bash allowed at most two digits.

* New escape sequences \" and \? are now recognized in printf strings,
  for compatibility with the C standard and with ksh93.

===================================================================
RCS file: builtins/printf.def,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- builtins/printf.def 2002/05/13 18:36:04     2.5.2.0
+++ builtins/printf.def 2003/09/26 20:24:50     2.5.2.1
@@ -30,7 +30,9 @@ characters, which are simply copied to s
 sequences which are converted and copied to the standard output, and
 format specifications, each of which causes printing of the next successive
 argument.  In addition to the standard printf(1) formats, %b means to
-expand backslash escape sequences in the corresponding argument, and %q
+expand backslash escape sequences in the corresponding argument (except
+that \c terminates output, backslashes in \', \", and \? are not removed,
+and octal escapes that start with \0 can have up to four digits), and %q
 means to quote the argument in a way that can be reused as shell input.
 $END
 
@@ -105,7 +107,7 @@ extern int errno;
 
 static void printf_erange __P((char *));
 static void printstr __P((char *, char *, int, int, int));
-static int tescape __P((char *, int, char *, int *));
+static int tescape __P((char *, char *, int *));
 static char *bexpand __P((char *, int, int *, int *));
 static char *mklong __P((char *, char *, size_t));
 static int getchr __P((void));
@@ -186,9 +188,9 @@ printf_builtin (list)
          if (*fmt == '\\')
            {
              fmt++;
-             /* A NULL fourth argument to tescape means to not do special
-                processing for \c. */
-             fmt += tescape (fmt, 1, &nextch, (int *)NULL);
+             /* A NULL third argument to tescape causes it to bypass
+                the special processing for %b arguments.  */
+             fmt += tescape (fmt, &nextch, (int *)NULL);
              putchar (nextch);
              fmt--;    /* for loop will increment it for us again */
              continue;
@@ -531,6 +533,7 @@ printstr (fmt, string, len, fieldwidth, 
   
 /* Convert STRING by expanding the escape sequences specified by the
    POSIX standard for printf's `%b' format string.  If SAWC is non-null,
+   do the processing appropriate for %b arguments.  In particular,
    recognize `\c' and use that as a string terminator.  If we see \c, set
    *SAWC to 1 before returning.  LEN is the length of STRING. */
 
@@ -540,11 +543,11 @@ printstr (fmt, string, len, fieldwidth, 
    value.  *SAWC is set to 1 if the escape sequence was \c, since that means
    to short-circuit the rest of the processing.  If SAWC is null, we don't
    do the \c short-circuiting, and \c is treated as an unrecognized escape
-   sequence.  */
+   sequence; also we bypass the other processing that is needed only for
+   %b arguments.  */
 static int
-tescape (estart, trans_squote, cp, sawc)
+tescape (estart, cp, sawc)
      char *estart;
-     int trans_squote;
      char *cp;
      int *sawc;
 {
@@ -576,14 +579,13 @@ tescape (estart, trans_squote, cp, sawc)
 
       case 'v': *cp = '\v'; break;
 
-      /* %b octal constants are `\0' followed by one, two, or three
-        octal digits... */
-      case '0':
-      /* but, as an extension, the other echo-like octal escape
-        sequences are supported as well. */
-      case '1': case '2': case '3': case '4':
-      case '5': case '6': case '7':
-       for (temp = 2+(c=='0'), evalue = c - '0'; ISOCTAL (*p) && temp--; p++)
+      /* The octal escapes are \0 followed by up to 3 octal digits (if SAWC)
+        or \ followed by up to 3 octal digits (if !SAWC).  As an extension,
+        we allow the latter form even if SAWC.  */
+      case '0': case '1': case '2': case '3':
+      case '4': case '5': case '6': case '7':
+       evalue = OCTVALUE (c);
+       for (temp = 2 + (!evalue && !!sawc); ISOCTAL (*p) && temp--; p++)
          evalue = (evalue * 8) + OCTVALUE (*p);
        *cp = evalue & 0xFF;
        break;
@@ -591,9 +593,9 @@ tescape (estart, trans_squote, cp, sawc)
       /* And, as another extension, we allow \xNNN, where each N is a
         hex digit. */
       case 'x':
-       for (temp = 2, evalue = 0; ISXDIGIT ((unsigned char)*p) && temp--; p++)
+       for (evalue = 0; ISXDIGIT ((unsigned char)*p); p++)
          evalue = (evalue * 16) + HEXVALUE (*p);
-       if (temp == 2)
+       if (p == estart + 1)
          {
            builtin_error ("missing hex digit for \\x");
            *cp = '\\';
@@ -606,8 +608,9 @@ tescape (estart, trans_squote, cp, sawc)
        *cp = c;
        break;
 
-      case '\'':       /* TRANS_SQUOTE != 0 means \' -> ' */
-       if (trans_squote)
+      /* !SAWC means \' -> ', and similarly for \" and \?.  */
+      case '\'': case '"': case '?':
+       if (!sawc)
          *cp = c;
        else
          {
@@ -657,7 +660,7 @@ bexpand (string, len, sawc, lenp)
          continue;
        }
       temp = 0;
-      s += tescape (s, 0, &c, &temp);
+      s += tescape (s, &c, &temp);
       if (temp)
        {
          if (sawc)
===================================================================
RCS file: doc/bash.1,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- doc/bash.1  2002/07/15 19:21:03     2.5.2.0
+++ doc/bash.1  2003/09/26 20:24:50     2.5.2.1
@@ -6939,7 +6939,10 @@ format specifications, each of which cau
 \fIargument\fP.
 In addition to the standard \fIprintf\fP(1) formats, \fB%b\fP causes
 \fBprintf\fP to expand backslash escape sequences in the corresponding
-\fIargument\fP, and \fB%q\fP causes \fBprintf\fP to output the corresponding
+\fIargument\fP (except that \fB\ec\fP terminates output, backslashes
+in \fB\e'\fP, \fB\e"\fP, and \fB\e?\fP are not removed, and octal
+escapes that start with \fB\e0\fP can have up to four digits),
+and \fB%q\fP causes \fBprintf\fP to output the corresponding
 \fIargument\fP in a format that can be reused as shell input.
 .sp 1
 The \fIformat\fP is reused as necessary to consume all of the \fIarguments\fP.
===================================================================
RCS file: doc/bashref.texi,v
retrieving revision 2.5.2.0
retrieving revision 2.5.2.1
diff -pu -r2.5.2.0 -r2.5.2.1
--- doc/bashref.texi    2002/07/15 19:21:24     2.5.2.0
+++ doc/bashref.texi    2003/09/26 20:24:50     2.5.2.1
@@ -3254,7 +3254,10 @@ format specifications, each of which cau
 @var{argument}.
 In addition to the standard @code{printf(1)} formats, @samp{%b} causes
 @code{printf} to expand backslash escape sequences in the corresponding
-@var{argument}, and @samp{%q} causes @code{printf} to output the
+@var{argument} (except that @samp{\c} terminates output, backslashes
+in @samp{\'}, @samp{\"}, and @samp{\?} are not removed, and octal
+escapes that start with @samp{\0} can have up to four digits),
+and @samp{%q} causes @code{printf} to output the
 corresponding @var{argument} in a format that can be reused as shell input.
 
 The @var{format} is reused as necessary to consume all of the @var{arguments}.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]