bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: read fails on null-byte: v4.1.7 FreeBSD 8.0 (amd64)


From: Matthew Story
Subject: Re: read fails on null-byte: v4.1.7 FreeBSD 8.0 (amd64)
Date: Wed, 23 Nov 2011 21:44:58 -0500

On Nov 23, 2011, at 7:09 PM, Chet Ramey wrote:

> On 11/23/11 6:54 PM, Matthew Story wrote:
>> On Nov 23, 2011, at 4:47 PM, Chet Ramey wrote:
>> 
>>> On 11/23/11 9:03 AM, Matthew Story wrote:
>>>> [... snip]
> 
> Yes, sorry.  That's what the "bash treats the line read as a C string"
> was intended to imply.  Since the line read is a C string, the NUL
> terminates it and what remains is assigned to the named variables.  I
> should have used `line' in my explanation instead of `foo'.

I understand that the underlying implementation of the bash builtins is `C', 
and I understand that `C' stings are NUL terminated.  It seems unreasonable to 
me to expect understanding of this implementation detail when using bash to 
read streams into variables via the `read' builtin.  Further-more, neither the 
man-page nor the gnu website document this behavior of bash:

read
          read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] 
[-p prompt] [-t timeout] [-u fd] [name ...]
One line is read from the standard input, or from the file descriptor fd 
supplied as an argument to the -u option, and the first word is assigned to the 
first name, the second word to the secondname, and so on, with leftover words 
and their intervening separators assigned to the last name. If there are fewer 
words read from the input stream than names, the remaining names are assigned 
empty values. The characters in the value of the IFS variable are used to split 
the line into words. The backslash character ‘\’ may be used to remove any 
special meaning for the next character read and for line continuation. If no 
names are supplied, the line read is assigned to the variable REPLY. The return 
code is zero, unless end-of-file is encountered, read times out (in which case 
the return code is greater than 128), or an invalid file descriptor is supplied 
as the argument to -u.

I personally do not read "One line" as meaning "One string of characters 
terminated either by a null byte or a new-line", I read it as "One string of 
characters terminated by a new-line".  But "One string of characters terminated 
either by a null byte or a new line" is not the actual functionality.  The 
actual functionality is:

"One line is read from the standard input, or from the file descriptor fd 
supplied as an argument to the -u option, then read byte-wise up to the first 
contained NUL, or end of string, ..."

Furthermore, I do not see the use-case for this behavior ... I simply cannot 
fathom a case of I/O redirection in shell where I would choose to inject a NUL 
byte to coerce this sort of behavior from the read builtin, and can't imagine 
that anyone is relying on this `C string' feature of read currently in bash, 
especially considering that it is not consistent with NUL handling in other 
assignments in bash:

[matt@matt0 ~]$ foo=`printf 'foo\0bar'`; echo "$foo" | od -a
0000000    f   o   o   b   a   r  nl                                    
0000007
[bash ~]$ foo=$(printf 'foo\0bar'); echo "$foo" | od -a
0000000    f   o   o   b   a   r  nl                                    
0000007

which strip NUL.

I see one of three possible resolutions here:

1. NUL bytes do not terminate variable assignment from `read', behavior of 
echo/variable assignments persists as is
2. NUL bytes are stripped by read on assignment, and this functionality is 
documented as expected.
3. the existing functionality of the system is documented in the man-page and 
on gnu.org as expected

I would prefer the first, and would be happy to attempt in providing a patch, 
if that's useful.

cheers,
-matt

> 
> Chet
> -- 
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

Additional Notes:

The only occurrence of the pattern `NUL' in the FreeBSD man-page for bash is:

       Pattern Matching

       Any character that appears in a pattern, other than the special pattern
       characters described below, matches itself.  The NUL character may  not
       occur  in  a pattern.  A backslash escapes the following character; the
       escaping backslash is discarded when  matching.   The  special  pattern
       characters must be quoted if they are to be matched literally.

All other references in the man-page are to the null string (empty string) not 
to an explicit NUL byte (e.g. ascii 0), the same is true of the gnu.org 
documentation.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]