help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: All bash variables that can be used as temp variables?


From: Greg Wooledge
Subject: Re: All bash variables that can be used as temp variables?
Date: Fri, 27 Jan 2023 07:46:34 -0500

On Thu, Jan 26, 2023 at 11:59:23PM -0500, Roger wrote:
> Isn't the following the usual recommended syntax?
> 
> _my_var
> 
> Prefixed underscored variable, in lowercase.
> 
> Uppercase is usually used for Bash and system variables.

*Sigh* ... this thread just won't die, will it.  OK.  Conventions.

By convention, an ALL_UPPER_CASE variable name is reserved for environment
variables (used across multiple programs) or special internal shell use.
Examples:

  HOME
  USER
  LOGNAME
  PATH
  BASH_VERSION
  RANDOM

Of course there are names which violate this convention, like http_proxy
(a lower-case environment variable) and histchars (an internal bash
variable).  Rules just wouldn't be any good without stupid exceptions.

One should avoid using all-upper-case variable names in one's scripts,
unless one is building an environment variable intended to be shared
among multiple programs, the script being just one of those.

The leading underscore thing... that's a lot more complex.  At this point,
we're talking about the private use of the variable namespace within one's
own script, so the rules are quite individual.

Some people use _leading_underscore variable names to indicate a variable
that is in some way "special".  Using a leading underscore makes it less
likely that someone will stomp on this variable by accident.

For example, someone might be foolish enough to try to write reusable
"library" functions in bash.  No matter how many times we warn people
that this is a fool's errand, they think it's needed for their project,
because their project is just *so* special and unique.

Let's say one is attempting to write a reusable random number generator
library function.  This will consist of two functions, one to set
the RNG seed, and one to retrieve the next number from the generator.
An RNG of this type needs to store the seed value in some place that
will persist for the duration of the application using the generator.
There are no "classes" or "object" or "namespaces" in bash, so the only
place you can store such a value would be a global variable, or a file.

Let's say that the global variable is chosen (the file would be just
ridiculous, really).

The author of the library will therefore have to come up with a name
for this global variable.  It should be one that's unlikely to be used
by any application, so something like "seed" won't do.  An application
might use that, not knowing that it's special to the RNG library.

This is where the leading underscore convention comes in.  The RNG
library author might decide to use the name "_seed".  Or perhaps
something containing the library's name, like "_myrandlib_seed", as a
sort of pseudo-namespace identifying that this particular global variable
belongs to the "myrandlib" library.

Nothing actually stops an application from using this variable and
breaking the library, but the odds of a collision are greatly reduced.

Here's one more example: namerefs (declare -n).  Bash 4.3 and later
have variables that "reference" other variables by name.  This can
be useful, for example to pass variables to a function by reference,
but there's a huge flaw in the implementation: the nameref shares the
same namespace as all the other variables.

Let's say we've got a function that takes a variable name as an argument,
with the intent to place its result into that variable.  Like this:

  # Usage: sum ouputVariableName input1 [...]
  sum() {
    local -n result=$1
    result=0
    shift

    local i
    for i; do
      ((result+=i))
    done

    return 0
  }

As long as the user conforms to expectations, the function "works" as
intended:

  unicorn:~$ sum foobar 1 2 3 4
  unicorn:~$ echo "$foobar"
  10

However, this function uses two local variables: result and i.  If the
application tries to use one of these as the output variable, we
may get a collision:

  unicorn:~$ sum result 1 2 3
  bash: local: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference
  bash: warning: result: circular name reference

As a workaround, a function that uses a name reference needs to ensure
that ALL of its local variables -- even iterators like "i" -- exist
in a pseudo-namespace that an application is unlikely to use.

Once again, the underscore convention can help.  The internal variables
of our function can be prefixed with "_sum_" or some similar tag that
marks them as "ours".  An application shouldn't use those names, and
we should be safe.

  # Usage: sum ouputVariableName input1 [...]
  sum() {
    local -n _sum_result=$1
    _sum_result=0
    shift

    local _sum_i
    for _sum_i; do
      # Possibly add a sanity check here.  Only accept integers.
      ((_sum_result += _sum_i))
    done

    return 0
  }

Applications should therefore NOT use variable names that begin with
an underscore without a really good reason.  Doing so risks stepping
on special variables that "library" functions are using.

If you've been using leading underscores for ALL of your variable names,
then you've been doing it wrong this whole time.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]