help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tokenize honoring quotes


From: Greg Wooledge
Subject: Re: tokenize honoring quotes
Date: Fri, 5 Aug 2022 16:01:56 -0400

On Fri, Aug 05, 2022 at 02:32:38PM -0400, Chet Ramey wrote:
> On 8/5/22 1:43 PM, Robert E. Griffith wrote:
> > Is there an efficient native bash way to tokenize a string into an array
> > honoring potentially nested double and single quotes?
> > 
> > For example...
> > 
> >     $ str='echo "hello world"'
> 
> Some variant of this:
> 
> str='echo "hello world"'
> 
> declare -a a
> eval a=\( "$str" \)
> 
> declare -p a

The biggest problem here is that there's no way to prevent command
substitutions and other code injections from occurring, when all you
actually wanted is the word splitting/parsing.

unicorn:~$ str='"hello world" "$(date 1>&2)"'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
Fri Aug  5 15:44:11 EDT 2022

Simply disallowing $() and backticks isn't sufficient either, as there
are code injections hiding all over.

unicorn:~$ x='y[$(date 1>&2)0]'
unicorn:~$ str='"hello world" ${a[x]}'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
Fri Aug  5 15:46:26 EDT 2022

The second biggest problem is unwanted globbing.  One might argue that
one can disable this with set -f before, and set +f after.  (Or variants
involving a lambda function and "local -".)  Nevertheless, it's a concern
that must be addressed.

unicorn:~$ str='"hello world" *.txt'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
unicorn:~$ declare -p a
declare -a a=([0]="hello world" [1]="37a.txt" [2]="68x68.txt" 
[3]="Application.txt" [4]="bldg.txt" [5]="burger15.txt" [...]

The only way to safeguard against code injections is to write an actual
parser, and not rely on shell tricks like eval, tempting as they may be.

Here's an extremely simplistic one, that only handles properly balanced
double quotes, without any kind of nesting.  Quotes, if present, must be
totally around the word they enclose, not partially embedded inside a
word.  It also only handles spaces, not tabs or other arbitrary
whitespace, but it could easily be extended for that if desired.

#!/bin/bash
shopt -s extglob

str=' one   two  "hello world"     three "four"'
a=()

str=${str##+( )}

while [[ $str = *\ * || $str = \"* ]]; do
  if [[ $str = \"* ]]; then
    word=${str:1}
    word=${word%%\"*}
    a+=("$word")
    str=${str#\"*\"}
  else
    word=${str%% *}
    a+=("$word")
    str=${str#* }
  fi
  str=${str##+( )}
done
if [[ $str ]]; then a+=("$str"); fi

declare -p a



reply via email to

[Prev in Thread] Current Thread [Next in Thread]