Re: a saner bootstrap script

bug-gnulib
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: a saner bootstrap script

From:	Gary V. Vaughan
Subject:	Re: a saner bootstrap script
Date:	Mon, 26 Sep 2011 14:08:07 +0700
Hi Stefano,

On 25 Sep 2011, at 22:55, Stefano Lattarini wrote:
> On Thursday 22 September 2011, Gary V wrote:
>> Anyone:
>> 
>> It's beginning to look as though all this work is, once again, in very
>> real danger of just slipping quietly through the cracks.
>> 
> Hi Gary.  While I don't pretend to really understand the scope and purposes
> of your script, I might anyway throw in some few observations, ideas and
> nits; hopefully they will be helpful.


Thanks for the reviews, all feedback much appreciated :)  I've applied
upstream (the copy kept in Zile currently does double-duty as the master
copy while awaiting adoption into gnulib) wherever I didn't comment other-
wise below.

> What about copying here the autoconf-provided code fragment to improve
> the shell bourne-compatibility?
> 
> # Be more Bourne compatible
> DUALCASE=1; export DUALCASE # for MKS sh
> if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then :
>   emulate sh
>   NULLCMD=:
>   # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which
>   # is contrary to our usage.  Disable this feature.
>   alias -g '${1+"$@"}'='"$@"'
>   setopt NO_GLOB_SUBST
> else
>   case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac
> fi
> 
> Or maybe this is too paranoid, and of no real practical importance?

Maybe too paranoid, but it doesn't hurt to err on the side of caution.
I started this script with a lot of shell portability boilerplate that
I put in all of my shell scripts, and these extra few lines can't do
any harm.

>> ## -------------------- ##
>> ## Shell normalisation. ##
>> ## -------------------- ##
>> 
>> # NLS nuisances.
>> LANGUAGE=C
>> export LANGUAGE
>> 
>> # Ensure file names are sorted consistently across platforms.
>> LC_ALL=C
>> export LC_ALL
>> 
>> # CDPATH.
>> (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
>> 
> If you are not using `set -e`, you could just call "unset CDPATH" here IMHO.

In principle, I agree. But for the sake of having shared boilerplate
code that won't blow up when adopted into a subsequent script, I'm
loathe to tidy it up here and end up scratching my head later if things
change again.

>> # func_append VAR VALUE
>> # ---------------------
>> # Append VALUE onto the existing contents of VAR.
>> if (eval 'x=a; x+=" b"; test "x$x" = "xa b"') 2>/dev/null
>> then
>>  # This is an XSI compatible shell, allowing a faster implementation...
>>  eval 'func_append ()
>>  {
>>    $debug_cmd
>> 
>>    eval "$1+=\$2"
>>  }'
>> else
>>  # ...otherwise fall back to using expr, which is often a shell builtin.
>>  func_append ()
>>  {
>>    $debug_cmd
>> 
>>    eval "$1=\$$1\$2"
>>  }
>> fi
>> 
> Why the empty line after '$debug_cmd'?  IMHO it does not improve readability,
> but only wastes vertcal space.  Ditto for other "one-liner" functions
> below (e.g., 'func_hookable' and 'func_remove_hook'), but *not* for longer
> functions (e.g., 'func_add_hook' and 'func_run_hooks')

For consistency, which I consider quite important.  I don't think it's a
good idea to add choke points that break ones flow when writing code - the
more often I have to stop and think "do I apply rule X or rule Y here", the
more likely I'll lose track of some other important context in my mind.

Note also, there are many snippets of code donated from other scripts that
have been tested for years in other projects (like libtoolize) if at times
there is something that might look a little odd out of that context.  Also,
I can't claim complete authorship of every line of code, since the bulk of
my testing involved making sure this bootstrap did a better job of bootstrapping
a moderate selection of existing gnulib bootstrap using projects, and in the
process absorbing common features of their bootstrap.conf customisations.

>> # func_run_hooks FUNC_NAME [ARG]...
>> # ---------------------------------
>> # Run all hook functions registered to FUNC_NAME.
>> func_run_hooks ()
>> {
>>    $debug_cmd
>> 
>>    case " $hookable_funcs " in
>>      *" $1 "*) ;;
>>      *) func_fatal_error "Error: \`$1' does not support hook funcions.n" ;;
>>    esac
>> 
> Code duplication with 'func_add_hook' above, and with many other functions
> throughout the script.  Couldn't the logic to determine whether an item is
> already present in a space-separated list be factored out in its own
> subroutine?  E.g.,
> 
>  # Usage: is_item_in_list ITEM [LIST ..]
>  is_item_in_list ()
>  {
>    item=$1; shift;
>    case " $* " in *" $item "*) return 0;; *) return 1;; esac
>  }
> 
> Such a refactoring could as well be done in a follow-up patch, though
> (in fact, I'd personally prefer to do it that way).

I've avoided using return under the impression that it is not entirely
portable, although I would be happy to see evidence that there are no Bourne
shells with function support and yet with broken return support.  Actually,
in a couple of places where the code was inspired by the incumbent bootstrap
script or some repeated bootstrap.conf snippets, there are still some latent
return statements that I didn't get around to factoring out.

In the hypothetical follow-up patch, I should definitely either eliminate
the last few return statements, or if they turn out to be a portable construct
after all, then make better use of them throughout where some of the code
to avoid them is unnecessarily torturous.

>>    eval hook_funcs="\$$1_hooks"
>> 
>>    # shift away the first argument (FUNC_NAME)
>>    shift
>>    func_run_hooks_result=${1+"$@"}
>> 
>>    for hook_func in $hook_funcs; do
>>      eval $hook_func '"$@"'
>> 
> Useless use of eval I think, since $hook_func is not expected to contain
> spaces nor metacharacters.  Simply using:
>  $hook_func "$@"
> should be enough.

You may be right, but the hook functions code is very delicate, and was
quite tricky to get right.  I arrived at the current implementation after
careful testing on many platforms and shells, and worry that changing it
now might upset one of those combinations that I won't discover without
going through the few days of rigorous testing all over again, which is
too much work to save a single eval IMHO.

>>      # store returned options list back into positional
>>      # parameters for next `cmd' execution.
>>      eval set dummy "$func_run_hooks_result"; shift
>> 
> This (together with the `func_run_hooks_result' assignement above) will
> not proprly preserve spaces in positional arguments:
> 
>  $ bash -c '(set "x  y"; a="$@"; eval set dummy "$a"; shift; echo "$@")'
>  x y
> 
> I don't know whether this matters or not, but is something to consider.

I can't think of a situation where whitespace preservation would be
important in this case.

>> # func_update_translations
>> # ------------------------
>> # Update package po files and translations.
>> func_hookable func_update_translations
>> func_update_translations ()
>> {
>>    $debug_cmd
>> 
>>    $opt_skip_po || {
>>      test -d po && {
>>        $require_package
>> 
>>        func_update_po_files po $package || exit $?
>>      }
>> 
>>      func_run_hooks func_update_translations
>>    }
>> }
>> 
> I personally find the code flow here quite unclear.  Something like
> this would be clearer IMHO:
> 
>  $opt_skip_po && return
>  if test -d po; then
>    $require_package
>    func_update_po_files po $package || exit $?
>  fi
>  func_run_hooks func_update_translations
> 
> The same goes for similar usages throughout the script.

I suppose it's just a matter of which idioms your eye have become accustomed
to.  Personally, I find the additional noise of 'if', '; then' and 'fi'
distracting, and have been happily using the braces idiom (when there's no
'else' branch to consider) for many many years as a result (e.g. the shell
code in libtool).

If all that code churn and retesting is a prerequisite for acceptance into
gnulib, then I'll reluctantly go through and change the style to match...
but that will also prevent cross-patching with donor code in libtool and
various other scripts I have that share the current idioms.

>> # require_autobuild_buildreq
>> # --------------------------
>> # Try to find whether the bootstrap requires autobuild.
>> require_autobuild_buildreq=func_require_autobuild_buildreq
>> func_require_autobuild_buildreq ()
>> {
>>    $debug_cmd
>> 
>>    printf "$buildreq"| func_grep_q '^[        ]*autobuild' || {
>> 
> Missing space before `|` here (cosmetic-only issue).

I've been using this style:

  produce \
  |filter1 \
  |filter2

productively for at least a couple of decades now (including my scripts
in libtool and m4), but going through the entire bootstrap script, I see that
I've been pretty inconsistent for some reason, so I normalized to this format
everywhere.  Thanks for pointing out the inconsistency.

>  Also, this use of
> printf can cause portability problems with grep, in case the string
> "$buildreq" is not newline-terminated.  I would suggest to use this instead:
> 
>  printf '%s\n' "$buildreq" | func_grep_q '^[   ]*autobuild'

Nice catch, thanks.  I found a handful of other cases of this, and fixed them
too.

>> # require_automake_buildreq
>> # -------------------------
>> # Try to find the minimum compatible version of automake required to
>> # bootstrap successfully, and add it to `$buildreq'.
>> require_automake_buildreq=func_require_automake_buildreq
>> func_require_automake_buildreq ()
>> {
>>    $debug_cmd
>> 
>>    # if automake is not already listed in $buildreq...
>>    printf "$buildreq"| func_grep_q automake || {
>>      func_extract_trace AM_INIT_AUTOMAKE
>> 
>>      # ...and AM_INIT_AUTOMAKE is declared...
>>      test -n "$func_extract_trace_result" && {
>>        automake_version=`echo "$func_extract_trace_result" \
>>           |$SED 's|[^0-9]*||; s| .*$||'`
>> 
> Missing space between '|' and '$SED' (cosmetic-only issue).

Deliberate.

>> # require_gnulib_submodule
>> # ------------------------
>> # Ensure that there is a current gnulib submodule at `$gnulib_path'.
>> require_gnulib_submodule=func_require_gnulib_submodule
>> func_require_gnulib_submodule ()
>> {
>>    $debug_cmd
>> 
>> [SNIP]
>> 
>>      trap func_cleanup_gnulib 1 2 13 15
>> 
>>      shallow=
>>      $GIT clone -h 2>&1 |func_grep_q -- --depth \
>>          && shallow='--depth 2'
>> 
>>      func_show_eval "$GIT clone $shallow '$gnulib_url' '$gnulib_path'" \
>>          func_cleanup_gnulib
>> 
>>      trap - 1 2 13 15
>> 
> Not portable to at least Solaris 10 /bin/sh; quoting the Autoconf manual:
> 
>  Posix says that "trap - 1 2 13 15" resets the traps for the specified 
> signals to
>  their default values, but many common shells (e.g., Solaris /bin/sh) 
> misinterpret
>  this and attempt to execute a "command" named '-' when the specified 
> conditions
>  arise. Posix 2008 also added a requirement to support "trap 1 2 13 15" to 
> reset
>  traps, as this is supported by a larger set of shells, but there are still 
> shells
>  like dash that mistakenly try to execute 1 instead of resetting the traps.
>  Therefore, there is no portable workaround, except for "trap - 0", for which
>  "trap '' 0" is a portable substitute.

You've lost me.  So, I should write:

  trap '' 0

instead of:

  trap - 1 2 13 15

?

That seems like I might as well just miss out the signal resetting altogether
(effectively what trap '' 0 seems to do anyway), and let the gnulib cleanup 
function
fire impotently if those signals are trapped long after the window it would 
have been
useful.

>> func_require_package_version ()
>> {
>>    $debug_cmd
>> 
>>    func_extract_trace AC_INIT
>> 
>>    save_ifs="$IFS"
>> 
> Useless use of quotes.

I think this is another style issue.  I'd rather be able to blindly write 
foo="$bar"
and know that it will always work the way I expect to avoid breaking the flow 
of my
coding session by adding another stall point where I have to stop and think 
about
whether there is any whitespace in the literal that needs to be protected, and 
whether
quote marks are spurious in each case.

>> # func_unset VAR
>> # --------------
>> # Portably unset VAR.
>> func_unset ()
>> {
>>    { eval $1=; unset $1; }
>> }
>> unset=func_unset
>> 
> What is the point of this function, if boostrap does not run under
> 'set -e'?  Using a simple 'unset' should be good enough, since even
> in shells where it returns a non-zero exit status for already-unset
> variables, it does not display any error message (at leat not that
> I know of).  Or are you just "erring on the side of safety", and
> deliberately?

It's boilerplate code that I have in all my scripts so that I can use func_unset
without having to stop and think, whether or not the script is in set -e mode.

>> func_cmp_s ()
>> {
>>    $CMP "$@" >/dev/null 2>&1
>> }
>> func_grep_q ()
>> {
>>    $GREP "$@" >/dev/null 2>&1
>> 
> Why redirecting also the stderr of grep and cmp to /dev/null?  This will
> hide potential problems due to e.g., incorrect regular expressions or
> non-existent files ...

I could rename them to func_cmp_sshhh_dont_make_a_sound ;)

>> # func_version
>> # ------------
>> # Echo version message to standard output and exit.
>> func_version ()
>> {
>>    $debug_cmd
>> 
>>    cat <<EOF
>> $progname $scriptversion
>> EOF
>> 
> printf '%s\n' "$progname $scriptversion" ?

Good idea.

>>    $SED -n '/(C)/!b go
>>        :more
>>        /\./!{
>>          N
>>          s|\n# | |
>>          b more
>>        }
>>        :go
>>        /^# Written by /,/# warranty; / {
>>          s|^# ||
>>          s|^# *$||
>>          s|\((C)\)[ 0-9,-]*[ ,-]\([1-9][0-9]* \)|\1 \2|
>>          p
>>        }
>>        /^# Written by / {
>>          s|^# ||
>>          p
>>        }' < "$progpath"
>> 
>>    exit $?
>> }
>> 
> Is this complexity really warranted by a function whose purpose is
> simply to print a version message?

DRY.  All the meta-data is in the header comment, and anything else that
wants to make use of that meta-data should extract it rather than create
another copy.  I've been using this code as is in many scripts for many
years, so it deals with all the corner cases very well.

>> # func_usage [NOEXIT]
>> # -------------------
>> # Echo short help message to standard output and exit.
>> func_usage ()
>> {
>>    $debug_cmd
>> 
>>    echo "$progpath [OPTION]..."
>>    echo ""
>>    echo "$usage_message"
>> 
>>    my_ret=$?
>>    test -n "$1" || {
>>      echo "Run \`$progname --help | more' for full usage"
>>      exit $my_ret
>>    }
>> }
>> 
> What's the point of '$my_ret' here?  Is only for catching potential
> write errors?

I don't know.  I suppose I added it at some point to take care of a
corner case, but I don't recall what it was, and now it looks quite
redundant.  I wish I'd been better at commenting my shell code when
I started out :(

> Better again, I would rewrite both the 'func_usage' and 'func_help'
> subroutines not to call exit explicity (let the callers do that
> if they want).

Better yet, rather than dancing around whether or not the usage function
should exit or not, I've factored the common code into it's own function,
so that both func_help ad func_usage can call it for the common output,
and then exit when they are done with the additional output they want to
add.

>> # func_quote_for_eval ARG...
>> # --------------------------
>> # Aesthetically quote ARGs to be evaled later.
>> # This function returns two values: FUNC_QUOTE_FOR_EVAL_RESULT
>> # is double-quoted, suitable for a subsequent eval, whereas
>> # FUNC_QUOTE_FOR_EVAL_UNQUOTED_RESULT has merely all characters
>> # which are still active within double quotes backslashified.
>> sed_quote_subst='s|\([`"$\\]\)|\\\1|g'
>> func_quote_for_eval ()
>> {
>>    $debug_cmd
>> 
>>    func_quote_for_eval_result=
>> 
>>    while test $# -gt 0; do
>>      case $1 in
>>        *[\\\`\"\$]*)
>>          my_unquoted_arg=`printf "$1" | $SED "$sed_quote_subst"` ;;
>> 
> Unportable use of sed, in case $1 is not newline-terminated.  Also,
> potential problems with printf, in case $1 contains `\' characters:
>  $ a='\t'; printf x"$a"x
>  $ x       x
> Use this instead:
>  printf '%s\n' "$1"
> 
> There might be similar problems in other part of the script?  I have not
> checked in great detail ...  You might want to take a better look.

Thanks, I found and fixed a few.

I inherited these problems from libtool, so I should go back and fix them
there too when I have time.

>> # func_show_eval CMD [FAIL_EXP]
>> # -----------------------------
>> # Unless opt_silent is true, then output CMD.  Then, if opt_dryrun is
>> # not true, evaluate CMD.  If the evaluation of CMD fails, and FAIL_EXP
>> # is given, then evaluate it.
>> func_show_eval ()
>> {
>>    $debug_cmd
>> 
>>    my_cmd="$1"
>>    my_fail_exp="${2-:}"
>> 
>>    ${opt_silent-false} || {
>>      func_quote_for_eval $my_cmd
>>      eval func_truncate_cmd $func_quote_for_eval_result
>>      func_echo "running: $func_truncate_cmd_result"
>>    }
>> 
>>    if ${opt_dry_run-false}; then :; else
>>      eval "$my_cmd"
>>      my_status=$?
>>      if test "$my_status" -eq 0;
>> 
> Useless use of quites around $my_status.    
> 
>>    then :; else
>>        eval "(exit $my_status); $my_fail_exp"
>>      fi
>>    fi
>> }
>> 

Actually, yuck for the mixed styles in that function.  Refactored the last
block to match the first:

    ${opt_dry_run-false} || {
      eval "$my_cmd"
      my_status=$?
      test 0 -eq $my_status || eval "(exit $my_status); $my_fail_exp"
    }

I should reflect that back into the upstream copy in libtool too at some
point.

>> # func_get_version APP
>> # --------------------
>> # echo the version number (if any) of APP, which is looked up along your
>> # PATH.
>> func_get_version ()
>> {
>>    $debug_cmd
>> 
>>    app=$1
>> 
>>    { $app --version || $app --version </dev/null; } >/dev/null 2>&1 \
>>      || return 1
>> 
> What's the point of the second `$app --version'?

To accomodate tools like git2cl, which error out with no stdin.

>>    $app --version 2>&1 |
>>    $SED -n '# extract version within line
>>             ...'
>> 
> Comments in sed scripts are not portable, sigh :-(

This was pasted from one of the GNU bootstrap.conf files I used to flesh
out the functionality of bootstrap... but I should have spotted it myself,
thanks for the reminder :)

> Still, instead of obfuscating all the script for the sake of some ancient
> sed implementation, you might want to check at the beginning of the script
> that $SED handles comments correctly, and abort if it is not the case
> (pointing the user to GNU sed maybe?).

Agreed.  I've added it to my TODO list.

>> # func_symlink_to_dir SRCDIR FILENAME [DEST_FILENAME]
>> # ---------------------------------------------------
>> # Make a symlink from FILENAME in SRCDIR to the current directory,
>> # optionally calling the symlink DEST_FILENAME.
>> func_symlink_to_dir ()
>> {
>>    $debug_cmd
>> [SNIP]
>> 
> The body of this function is IMVHO complex enough to deserve some more
> comments.

I rescued this implementation from one of Paul or Bruno's bootstrap.conf
files during testing of my bootstrap rewrite, and commented as best I could.
But I'll be the first to admit I don't really follow the slightly torturous
logic every step of the way.  Better comments, or clearer implementation
gratefully accepted.

>> func_update_po_files ()
>> {
>>    $debug_cmd
>> 
>> [SNIP]
>> 
>>      if ! test -f "$cksum_file"
>> 
> Special `!' command is unportable to at least Solaris 10 /bin/sh:
> 
>  $ /bin/sh -c '! test -f /none'
> /bin/sh: !: not found
> 
> Possibly other similar usages throughout the script (I haven't checked
> in detail).

Only func_update_po_files() and func_symlink_to_dir() use that construct,
but these functions were both absorbed from other GNU project bootstrap.conf
files, and are delicate enough that I'm afraid of breaking them at this
stage.  Most likely the originals were written by Paul, Jim or Bruno, so
they might be able to suggest alternative implementations?  Otherwise I will
try to go back and find the original projects and pass the bug report back
upstream.

>> dirname='s|/[^/]*$||'
>> basename='s|^.*/||'
>> 
> Maybe rename these to `$sed_dirname_cmd' and `$sed_basename_cmd', for
> clarity?  (cosmetic-only issue).

That's a lot of extra typing ;)  I think context makes it very clear
that these are sed snippets, so no need to use over-long variable names
IMHO.

>> # Work around backward compatibility issue on IRIX 6.5. On IRIX 6.4+, sh
>> # is ksh but when the shell is invoked as "sh" and the current value of
>> # the _XPG environment variable is not equal to 1 (one), the special
>> # positional parameter $0, within a function call, is the name of the
>> # function.
>> 
> Something similar holds for Zsh if I'm not mistaken (it would be worth
> to mention that in this comments IMO).

Care to check, and suggest the additional comment text if necessary?

> Finally, a couple of more general observations:
> 
>  1. The bootstrap script is now complex enough to warrant the
>     introduction of a testsuite.

That's an excellent notion.  But after a year or more of prodding and
cajoling, I haven't even gotten the script itself accepted into gnulib.
I'm not ready to burn another month or two of my GNU hacking time yet,
at least until all the work I put into bootstrap itself is legitimized
by it's acceptance.

Actually, I'd really like to see unit tests for at least the functions
this script has in common with libtoolize and some of libtool's other
m4sh generated scripts.  But I don't have the energy to pour into that
just now.

>  2. IMHO, since bootstrap is a maintainer tool anyway, we should
>     assume that a POSIX shell is used to run it, thus allowing
>     various simplifications and optimizations.  If we want to be
>     helpful towards developer having an inferior default shell,
>     we could run a sanity check on the shell itself early in the
>     script, warning the user if his shell isn't able to handle
>     the required constructs and usages.

Because of the amount of boilerplate in this script (from my own code,
much of which I also share with libtool and m4) I'd rather not optimize
any of it for bootstraps particular use, since that reduces future
sharing opportunities.

> HTH, and thanks for all your work on this!

I'm extremely happy to have another set of eyes on the code at last, so
thanks again for making the time to review.

Cheers,
-- 
Gary V. Vaughan (gary AT gnu DOT org)
[Prev in Thread]
Current Thread
[Next in Thread]
Re: I fixed this once already [Was Re: [PATCH] bootstrap: obey --no-git.], Gary V. Vaughan, 2011/09/07
- a saner bootstrap script [Was Re: I fixed this once already], Gary V. Vaughan, 2011/09/22
  - Re: a saner bootstrap script [Was Re: I fixed this once already], Paul Eggert, 2011/09/22
    - Re: a saner bootstrap script, Gary V. Vaughan, 2011/09/22
    - Re: a saner bootstrap script, Paul Eggert, 2011/09/22
  - Re: a saner bootstrap script [Was Re: I fixed this once already], Stefano Lattarini, 2011/09/25
    - Re: a saner bootstrap script, Gary V. Vaughan <=
    - Re: a saner bootstrap script, Stefano Lattarini, 2011/09/26
    - Re: a saner bootstrap script, Gary V. Vaughan, 2011/09/26
    - Re: a saner bootstrap script, Stefano Lattarini, 2011/09/26
  - Re: a saner bootstrap script [Was Re: I fixed this once already], Stefano Lattarini, 2011/09/25
    - Re: a saner bootstrap script, Gary V. Vaughan, 2011/09/26
    - Re: a saner bootstrap script, Stefano Lattarini, 2011/09/26
    - Re: a saner bootstrap script, Gary V. Vaughan, 2011/09/26
    - Re: a saner bootstrap script, Stefano Lattarini, 2011/09/26
- a saner bootstrap script [Was Re: I fixed this once already], Gary V . Vaughan, 2011/09/22
  - Re: a saner bootstrap script [Was Re: I fixed this once already], Jim Meyering, 2011/09/22
Prev by Date: Re: msvc-inval: more options
Next by Date: Re: Bug in base64, new module base32
Previous by thread: Re: a saner bootstrap script [Was Re: I fixed this once already]
Next by thread: Re: a saner bootstrap script
Index(es):
- Date
- Thread