bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some byte combinations affect UTF-8 string reading


From: Olga Ustuzhanina
Subject: Re: Some byte combinations affect UTF-8 string reading
Date: Tue, 26 Feb 2019 05:42:08 +0700

On Mon, 25 Feb 2019 12:59:38 -0800
L A Walsh <bash@tlinx.org> wrote:

> In this case, the decode of \xc2 doesn't swallow the following
> character.

I want to clarify that \xc2 (and other characters in the range
mentioned above) can only swallow a \0. Other characters are
unaffected.

> 
> But in 4.4.12, using IFS='':
> 
> ntc() {  while IFS='' read -r input; do printf "$input;" ; done ; }

Looks like `-d ''` is necessary to get `read` to process anything:

$ ntc() {  while IFS='' read -r  input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd

$ ntc() {  while IFS='' read -r -d '' input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b                                .;;;

On bash 4.4.19 I have a different output:

$ ntc() {  while IFS='' read -r -d ''  input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b 3b                             .;;;;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]