bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Consume only up to 8 bit octal input for backslash-escaped chars (ec


From: Don Cragun
Subject: Re: Consume only up to 8 bit octal input for backslash-escaped chars (echo, printf)
Date: Tue, 7 Dec 2010 21:23:45 -0800

On Dec 7, 2010, at 6:02 PM, Eric Blake wrote:
> [adding the Austin Group]
> 
> On 12/07/2010 06:19 PM, Chet Ramey wrote:
>> On 12/7/10 11:12 AM, Roman Rakus wrote:
>>> This one is already reported on coreutils:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=2;bug=7574
>>> 
>>> The problem is with numbers higher than /0377; echo and printf consumes all
>>> 3 numbers, but it is not 8-bit number. For example:
>>> $ echo -e '\0610'; printf '\610 %b\n' '\610 \0610'
>>> Should output:
>>> 10
>>> 10 10 10
>>> instead of
>>> �
>>> � � �
>> 
>> No, it shouldn't.  This is a terrible idea.  All other shells I tested
>> behave as bash does*, bash behaves as Posix specifies, and the bash
>> behavior is how C character constants work.  Why would I change this?
>> 
>> (*That is, consume up to three octal digits and mask off all but the lower
>> 8 bits of the result.)
> 
> POSIX states for echo:

Note that the behavior of echo is implementation-defined if a
<backslash> character appears on any of echo's operands unless the
X/Open System Interfaces option is supported by the implementation.

> 
> "\0num Write an 8-bit value that is the zero, one, two, or three-digit
> octal number num."
> 
> It does not explicitly say what happens if a three-digit octal number is
> not an 8-bit value, so it is debatable whether the standard requires at
> most an 8-bit value (two characters, \0061 followed by 0) or whether the
> overflow is silently ignored (treated as one character \0210), or some
> other treatment.

If the XSI option is supported, the behavior of \0xxx is implicitly
unspecified if xxx is 400(octal) or larger.  If you believe this is
not enough, we could change the text to make this situation explicitly
unspecified by changing:
        If more than two hexadecimal digits immediately follow \x,
        the results are unspecified.
in the current resolution of bug ID 249 to:
        If more than two hexadecimal digits immediately follow \x
        or if the octal value specified by /XXX will not fit in a
        byte, the results are unspecified.

> 
> The C99 standard states (at least in 6.4.4.4 of the draft N1256 document):
> 
> "The value of an integer character constant containing more than one
> character (e.g., 'ab'), or containing a character or escape sequence
> that does not map to a single-byte execution character, is
> implementation-defined."

The specification of \0xxx in echo with the XSI option is intended to
behave as UNIX System V behaved as specified by the System V Interface
Definition, version 3 (SVID3).  (SVID3 was one of the base documents of
the original POSIX.2 standard, but since BSD systems and UNIX System V
systems had different behavior for echo, the original POSIX.2 standard
left the behavior unspecified (allowing the then current implementations
to continue to behave as they had).  The X/Open Portability Guide, Issue 3
continued to require the behavior as specified in SVID3.  When the original
POSIX.2 and XPG3 were merged in a later revision of the standards, we got
the XSI option (which required the System V behavior); but when XSI option
support is not claimed the old BSD or System V behavior is allowed.

 - Don

> 
> leaving '\610' as an implementation-defined character constant.
> 
> The Java language specifically requires "\610" to parse as "\061"
> followed by "0", and this can be a very useful property to rely on in
> this day and age where 8-bit bytes are prevalent.
> 
> http://austingroupbugs.net/view.php?id=249 is standardizing $'' in the
> shell, and also states:
> 
> "\XXX yields the byte whose value is the octal value XXX (one to three
> octal digits)"
> 
> and while it is explicit that $'\xabc' is undefined (as to whether it
> maps to $'\xab'c or to $'\u0abc' or to something else), it does not have
> any language talking about what happens when an octal escape does not
> fit in a byte.
> 
> Personally, I would love it if octal escapes were required to stop
> parsing after two digits if the first digit is > 3, but given that C99
> leaves it implementation defined, I think we need a POSIX interpretation
> to resolve the issue.  Also, I think this report means that we need to
> tweak the wording of bug 249 (adding $'') to deal with the case of an
> octal escape where three octal digits do not fit in 8 bits (either by
> explicitly declaring it unspecified, as is the case with \x escapes; or
> by requiring implementation-defined behavior, as in C99; or by requiring
> explicit end-of-escape after two digits, as in Java).
> 
> -- 
> Eric Blake   eblake@redhat.com    +1-801-349-2682
> Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]