[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: test '-v' - associative vs. normal array discrepancy - a bug ?

From: konsolebox
Subject: Re: test '-v' - associative vs. normal array discrepancy - a bug ?
Date: Sat, 22 Nov 2014 10:29:25 +0800

On Sat, Nov 22, 2014 at 5:43 AM, Greg Wooledge <wooledg@eeg.ccf.org> wrote:
> Well, I did overstate the problem slightly.  There are some scenarios
> where you can successfully pass the name of a variable as an argument
> to a function, but they are limited, and you must handle everything
> precisely right.
> None of the obvious or elegant approaches will work without extra safety
> checks and byzantine constructions.

Most of the time, these safety checks are all just about a reference to a
parameter.  This may not be needed on runtime to those who write their scripts
properly.  Something like a well-tested production script may no longer need
it.  The ones that always need checking are those that produce dynamic variable
names on runtime.

>> This is just one:
>> https://sourceforge.net/p/playshell/code/ci/master/tree/source/filter.sh
> OK, right off the bat, you are using an undocumented HACK:
> function filter {
>         local REF="$1[@]" X
> I don't remember whether we ever got official word from Chet on whether
> this syntax would continue to be supported in the future.  It seems to
> me more of an accidental misfeature that people have latched onto because
> there is no other way to get things done.  (Besides eval.)

Ok, if you consider that as a hack all because it's undocumented you are still
free to use eval.  It's just a more efficient shortcut to:

eval "TEMP=(\"\${1[@]}\")"
for X in "${TEMP[@]}"; do ...

> The fact that one must use this undocumented hack (which may or may not
> magically vanish the next time Chet tightens up the grammar) in order
> to retrieve values from an array whose name is passed as an argument is
> already one strike against it.

I'm placing my bet that it won't change.  If it does, I'd just revert back to
using eval - if I really need to.  And for something that works from 3.0
to current snapshot of Bash, I think I'm already quite be happy having it.

Of course it's best if the behavior doesn't change and that the documentation
is just updated.  It's not only me who finds it useful.  I've also seen people
use it as a solution in SO.

> In my previous message I demonstrated why declare -n is just as insecure
> as eval.  In fact it's worse, because:
>  1) There's always one name that you cannot pass as an argument (the same
>     name that the ref is planning to use).

Again declare -n is not my general concern but, if you're still using a local
variable to hold the reference, wouldn't that apply to eval as well?  Of course
if you mean about just using the positional parameters to hold the variable's
name then I wouldn't disagree with that.

>  2) It gives the ILLUSION that it works like ksh's nameref, and thus the
>     ILLUSION of safety, which lulls the user into thinking he won't have
>     to worry about security.

Which doesn't apply to everyone.  And not every bash user even knows about
ksh's nameref so most of the time you'd really check first how declare -n
really behaves before using it.

And I also don't think the description is that explicit:

    -n    make NAME a reference to the variable named by its value

Take special attention to the phrase "by its value".  There's no explicit
part of the description that says it's a hard reference.

>> > This is precisely why I tell people not to try to write "libraries" of
>> > reusable code in bash.  You just can't do it.  It's pointless to try.
>> Sorry but it's not.  Or at least most of the time it's not always.
> There are some simple things you can write reusably, without doing
> backflips.  For example, I have a function that "returns" a random number:

Yes like what I've been saying you can you use a volatile shared variable to
hold the return value.  And if you're capable of properly using both methods,
you get the opportunity of writing simpler or better functions.  And sometimes
referring to variables is really just the only way.

> If you try to
> implement a complex function in bash (or almost any other shell), the
> odds are the user can BREAK THINGS (cause arbitrary code execution) by
> passing the "wrong thing" as an argument.

These stuffs are main concerns about evaluating a dynamic code in general but
they're not always directly applicable to problems in referencing variable
names I believe.  I just don't see how that can be a valuable argument to not
passing variables by reference.  Sure there are many cases how a data when
eval'd could be exploited (I have my doubts how often these stuffs could give
unexpected troubles and you can open a new topic for that) but the way to check
it for variable names is rather really very simple.

And if that's about eval, there are correct ways to use eval.  "Eval is evil."
is really an overused and overrated statement.

Also, checking variable data that's generated on runtime with a source that's
arguably exploitable and is subject for evaluation as a code is a basic
practice.  Anyone not doing that is probably just not doing it right; and not
because using declare -n, eval, printf -v or indirect variable expansion is

> There is a whole extra class of issues that one must worry about, and
> it's REALLY difficult to write a function that not only performs the
> desired task correctly, but also does so without introducing new attack
> vectors against the operating system.

Seriously it's not that always difficult.  And with proper handling of syntax
whether eval'd or not, the one thing you'd really concern yourself most of the
time are the external parameters you try to refer.  Also, a good editor and
editing skill or practice can help.

> I gave this example in my previous message, but it bears repeating:
> blackbox() {
>   declare -n arg=$1
>   printf '%s\n' "$arg"
> }
> This looks completely innocuous, but it's NOT.  declare -n is a trap
> for the poor unwary programmer.

Not because it's possibly deceiving to a less-experience programmer, doesn't
mean it's no longer helpful or that that should be completely avoided.  That
doesn't apply to everyone.

> imadev:~$ blackbox 'a[b=$(date)]'
> bash: b=Fri Nov 21 15:55:58 EST 2014: syntax error in expression (error token 
> is "Nov 21 15:55:58 EST 2014")
> If you want to use declare -n, you have to write several lines of safety
> checking every time:
> blackbox() {
>   if [[ $1 = arg ]]; then
>     echo "blackbox: 'arg' is not permitted, because reasons" >&2
>     return 1
>   elif [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
>     echo "blackbox: argument must be a non-indexed identifier" >&2
>     return 1
>   fi
>   declare -n arg=$1
>   ...
> }

Unless blackbox() is really meant to reference dynamically produced variables,
this is really just enough:

[[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]] || {
    echo "Fatal error: Invalid variable name: $1"
    exit 1

Or even simpler:

[[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]] || fatal_error "Invalid
variable name: $1"

And again, that's just optional.  These stuffs are not really about runtime.

> There's simply no other way to be secure.  Every feature that bash has
> which even comes close to this has the same issues.
> imadev:~$ var='a[b=$(date)]'
> imadev:~$ printf -v "$var" %s foo
> bash: b=Fri Nov 21 16:00:51 EST 2014: syntax error in expression (error token 
> is "Nov 21 16:00:51 EST 2014")
> So, printf -v isn't safe either.  You have to sanitize the argument of -v
> for exactly the same reason you have to sanitize the variable pointed to
> by declare -n, or any variable that you expand with eval.

Basically anything evaluated to run dynamically in runtime is dangerous if not
used properly so there's no surprise to that.

> If you don't, you introduce an opportunity for arbitrary code execution.

Not always.  Sometimes these possible problems are just as serious as having
typos unless again if variables are dynamic.

> So, if you want to write a function that populates a (scalar!) variable
> of the caller's choosing, here's one way to do it:
> # Add a bunch of integers and store the result
> # varname may not be __i or __sum
> addemup() {
>   (($#)) || { echo "usage: addemup varname [int ...]" >&2; return 1; }
>   # optionally check for $1 = __i or __sum
>   if [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
>     echo "addemup: varname must be a non-indexed identifier" >&2; return 1
>   fi
>   local __sum=0 __i
>   for __i in "${@:2}"; do ((__sum+=__i)); done
>   eval "$1=\$__sum"
> }
> And sure enough, you mustn't pass __i or __sum to it:
> imadev:~$ addemup __i 1 3
> imadev:~$ echo $__i

Yes that's not different from the idea I mentioned earlier:

On Fri, Nov 21, 2014 at 10:45 AM, konsolebox <konsolebox@gmail.com> wrote:
> I believe naming conflicts can easily be avoided by having naming rules.  For
> example, if a function that's meant to be shared refers to a variable, every
> variable in it should start with two underscores e.g. __a, __b, __r.

On Sat, Nov 22, 2014 at 5:43 AM, Greg Wooledge <wooledg@eeg.ccf.org> wrote:
> The problem is bash has no way to refer to a variable in the caller's
> scope.  As soon as you have any local variable, it shadows the caller's
> variable of the same name.

Like I said earlier:

On Fri, Nov 21, 2014 at 10:45 AM, konsolebox <konsolebox@gmail.com> wrote:
> ...
> These
> forms of variables must be declared local and are only allowed to be used
> inside the function.  They are not allowed to be used by other functions or
> passed by reference to another.  If a referring function must call another
> referring function, the local variable that it should pass should be prefixed
> with its name e.g. my_function__var.

On Sat, Nov 22, 2014 at 5:43 AM, Greg Wooledge <wooledg@eeg.ccf.org> wrote:
>             eval $1=\"\$2\"          # Return single value
>         else
>             eval $1=\(\"\${@:2}\"\)  # Return array

Off-topic but that form of eval for me is wrong.  A good practice is to always
avoid word splitting and pathname expansion if it's not needed even if
unexpected - unless your software explicitly does set -f (for pathname exp.).
Also, when using eval the safest form is to always wrap-up codes inside

    eval "$1=\$2"  ## No need to place $2 around double-quotes.
    eval "$1=(\"\${@:2}\")"

And the best basic way to test eval is to run echo in front of it.  The output
is exactly what would execute.

Exceptions are just when you're just trying to hide a code in a generic script
and there's nothing in it that expands.  With that you can just use
single-quotes instead if appplicable or still needed.

On Sat, Nov 22, 2014 at 5:43 AM, Greg Wooledge <wooledg@eeg.ccf.org> wrote:
> addemup() {
>     (($#)) || { echo "usage: addemup varname [int ...]" >&2; return 1; }
>     if [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
>         echo "addemup: varname must be a non-indexed identifier" >&2; return 1
>     fi
>     local sum=0 i
>     for i in "${@:2}"; do ((sum+=i)); done
>     local "$1" && upvar "$1" "$sum"
> }

shopt -s extglob

# Usage: final_error msg
# A function that's not always necessary but is helpful if called by many.
fatal_error() {
    echo "${FUNCNAME[1]}: $1"
    exit 1

# Usage: addemup varname [int ...]
addemup() {
    # Again this is just optional.  It's mostly needed if variable is dynamic.
    # If not, the problem you could have would have would just be as bad as
    # having a typo and is nothing about runtime or anything that gives
    # difference to having arbitrary code execution whether you run a dry
    # script or not.
    [[ $1 == [[:alpha:]_]*([[:alnum:]_]) ]] || fatal_error "varname
must be a non-indexed identifier"

    local __sum=0 __i
    for __i in "${@:2}"; do
        (( __sum += __i ))
    eval "$1=\$__sum"

> It's pure voodoo! You need a dozen lines of code, including a function
> that no normal person can understand that comes with its own warning
> label, just to put a value into a variable safely!

You seriously exaggerate.  I guess this is more of just a pessimistic or
conservative resent towards the practice.

> Why would you do this to yourself?  Just switch to a real programming
> language.

Everyone always considers that if it's a general choice of language for the
application as a whole, but if it's all just because you don't like using
indirect references then that I'm not sure about.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]