help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to make `read` fail if there is not enough fields in the input?


From: Stephane Chazelas
Subject: Re: How to make `read` fail if there is not enough fields in the input?
Date: Thu, 5 Dec 2019 15:41:54 +0000
User-agent: NeoMutt/20180716

2019-12-05 08:44:05 -0500, Greg Wooledge:
> > On 12/4/19 10:19 PM, Peng Yu wrote:
> > >> $ IFS=$'\t' read -r -a array <<< x && (( "${#array[@]}" == 2 ))
> 
> The fundamental problem here is that tab is a whitespace character.
> When you use whitespace characters in IFS to delimit fields, multiple
> consecutive instances of IFS whitespace characters are considered ONE
> delimiter.
> 
> In other words:
> 
> wooledg:~$ IFS=$'\t' read -ra array <<< $'foo\t\t\tbar\tbaz\t'
> wooledg:~$ declare -p array
> declare -a array=([0]="foo" [1]="bar" [2]="baz")
[...]

Note that in ksh93 and bash, that IFS-whitespace
special-treatment of the TAB character can be removed by having
it twice in IFS:

$ printf 'a\t\tb\n' | ksh93 -c $'IFS="\t\t" read -A a; typeset -p a'
typeset -a a=(a '' b)
$ printf 'a\t\tb\n' | zsh -c $'IFS="\t\t" read -A a; typeset -p a'
typeset -a a=( a '' b )

I don't think bash supports it yet.

> The "solution" to this, if you can call it that, is to use something
> other than whitespace as your delimiter.  You could, for example,
> replace all of the tab characters with $'\003' characters, and then
> use $'\003' as your delimiter.
[...]

That still won't work if the last field is empty as in bash,
like in ksh93 (and as POSIX requires, but unlike in zsh), IFS is
treated as a Field Delimiter, not Field Separator. Both "a\3"
and "a" are split into a ("a") array (instead of ("a" "") for
the former).

awk -F '\t' or perl -F'\t' -la would have the opposite problem
that an empty line is split into an array of 0 element instead
of an array of one empty element.

Compare:

$ printf '%b\n' 'a\tb' 'a\t' 'a' '' | zsh -c $'while IFS="\t\t" read -A a; do 
typeset -p a; done'
typeset -a a=( a b )
typeset -a a=( a '' )
typeset -a a=( a )
typeset -a a=( '' )
$ printf '%b\n' 'a\tb' 'a\t' 'a' '' | ksh93 -c $'while IFS="\t\t" read -A a; do 
typeset -p a; done'
typeset -a a=(a b)
typeset -a a=(a)
typeset -a a=(a)
typeset -a a=('')
$ printf '%b\n' 'a\tb' 'a\t' 'a' '' | awk -F'\t' '{print NF}'
2
2
1
0

For a bash "solution", you'd need to append an extra delimiter.
Something like:

sed $'s/\t/\3/g; s/$/\3/' < input.tsv | while IFS=$'\3' read -ra array...

Or:

paste input.tsv /dev/null | tr '\t' '\3' | ...

-- 
Stephane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]