[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: BashPitfall 65, read reading past the delimiter on records ending in
From: |
Greg Wooledge |
Subject: |
Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters |
Date: |
Sun, 20 Apr 2025 18:58:32 -0400 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Sun, Apr 20, 2025 at 17:31:56 -0400, Chet Ramey wrote:
> On 4/20/25 3:08 AM, Stephane Chazelas wrote:
>
> > $ printf '%b\0' winter 'spring\0315' summer automn |
> > bash -c 'while IFS= read -rd "" season; do printf "<%q>\n" "$season";
> > done'
> > <winter>
> > <$'spring\315'>
> > <automn>
> >
> > skipping summer, or maybe worse:
> >
> > $ printf '%b\n' winter 'spring\0315' summer automn |
> > bash -c 'while IFS= read -r season; do printf "<%q>\n" "$season"; done'
> > <winter>
> > <$'spring\315\nsummer'>
> > <automn>
> >
> > bundling spring with summer (all with bash-5.2 on Debian for instance)
>
> This has been fixed since last July, and the fix is in bash-5.3. The bug
> concerns unicode combining characters introducing invalid unicode character
> sequences that happen to contain the delimiter, and was reported privately.
That one may be fixed, but:
bash-5.3$ printf 'FOO\0\315\0\226\0' | while IFS= read -rd '' f; do printf
'<%q>\n' "$f"; done
<FOO>
<$'\315'>
<''>
<''>
The context for all of this was someone in IRC who was reading a chunk
of data from /dev/urandom and got different results with LC_CTYPE=C vs.
LC_CTYPE=en_US.utf8 (or other UTF-8 locale). This is a simplified
reproducer.
In real-life scripts, this kind of thing could arise if someone reads
a NUL-delimited stream of pathnames from find -print0, or equivalent.
Since nobody seems to have reported it officially yet, I'm going to
add a Cc: bug-bash on this one.
- Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters,
Greg Wooledge <=