[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Work around BSD `join` bug
From: |
Bruno Haible |
Subject: |
Re: Work around BSD `join` bug |
Date: |
Thu, 11 Jan 2024 12:43:07 +0100 |
Paul Eggert wrote:
> > - POSIX violation or not? Is it valid to pass lines with missing fields
> > to 'join', according to POSIX [1]?
>
> It should be valid, yes. POSIX 'join' defers to POSIX 'sort' for the
> definition of fields, and POSIX 'sort' says missing fields should be
> treated as empty.
Thanks for explaining. Also, POSIX [1] says:
"Some historical implementations have been encountered where a blank line
in one of the input files was considered to be the end of the file; the
description in this volume of POSIX.1-2017 does not cite this as an
allowable case."
> >> Then, would it make sense to document it in the GNU Autoconf manual? [2]
>
> Sure, I installed the attached patch to the Autoconf manual.
Thanks!
I see that macOS 12.6, FreeBSD 14.0, and NetBSD 9.3 have the bug, whereas
OpenBSD does not have it (already at least since OpenBSD 3.8, which was
in 2005).
Now, back to gnulib-tool. I'm committing this patch below, that rejects
a broken 'join' program.
It would be possible to obey a variable named JOIN, via "${JOIN-join}"
instead of 'join'. But that adds complexity, and we don't have a variable
named SED in gnulib-tool either.
[1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/join.html
2024-01-11 Bruno Haible <bruno@clisp.org>
gnulib-tool: Reject broken 'join' program as seen in macOS, FreeBSD etc.
Reported by Avinash Sonawane <rootkea@gmail.com> in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-01/msg00028.html>.
* gnulib-tool: Move the func_gnulib_dir and func_tmpdir invocations
ahead. If the 'join' program exists but does not handle missing fields,
bail out.
diff --git a/gnulib-tool b/gnulib-tool
index b909a81f7a..9facfd2be7 100755
--- a/gnulib-tool
+++ b/gnulib-tool
@@ -894,15 +894,6 @@ func_hardlink ()
}
}
-# The 'join' program does not exist on all platforms. Where it exists,
-# we can use it. Where not, bail out.
-if (type join) >/dev/null 2>&1; then
- :
-else
- echo "$progname: 'join' program not found. Consider installing GNU
coreutils." >&2
- func_exit 1
-fi
-
# Ensure an 'echo' command that
# 1. does not interpret backslashes and
# 2. does not print an error message "broken pipe" when writing into a pipe
@@ -1071,6 +1062,38 @@ if test "X$1" = "X--no-reexec"; then
shift
fi
+func_gnulib_dir
+func_tmpdir
+trap 'exit_status=$?
+ if test "$signal" != EXIT; then
+ echo "caught signal SIG$signal" >&2
+ fi
+ rm -rf "$tmp"
+ exit $exit_status' EXIT
+for signal in HUP INT QUIT PIPE TERM; do
+ trap '{ signal='$signal'; func_exit 1; }' $signal
+done
+signal=EXIT
+
+# The 'join' program does not exist on all platforms, and
+# on macOS 12.6, FreeBSD 14.0, NetBSD 9.3 it is buggy, see
+# <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232405>.
+# In these cases, bail out. Otherwise, we can use it.
+if (type join) >/dev/null 2>&1; then
+ echo a > "$tmp"/join-input-1
+ { echo; echo a; } > "$tmp"/join-input-2
+ if LC_ALL=C join "$tmp"/join-input-1 "$tmp"/join-input-2 | grep a >/dev/null
\
+ && LC_ALL=C join "$tmp"/join-input-2 "$tmp"/join-input-1 | grep a
>/dev/null; then
+ :
+ else
+ echo "$progname: 'join' program is buggy. Consider installing GNU
coreutils." >&2
+ func_exit 1
+ fi
+else
+ echo "$progname: 'join' program not found. Consider installing GNU
coreutils." >&2
+ func_exit 1
+fi
+
# Unset CDPATH. Otherwise, output from 'cd dir' can surprise callers.
(unset CDPATH) >/dev/null 2>&1 && unset CDPATH
@@ -1690,19 +1713,6 @@ func_determine_path_separator
esac
}
-func_gnulib_dir
-func_tmpdir
-trap 'exit_status=$?
- if test "$signal" != EXIT; then
- echo "caught signal SIG$signal" >&2
- fi
- rm -rf "$tmp"
- exit $exit_status' EXIT
-for signal in HUP INT QUIT PIPE TERM; do
- trap '{ signal='$signal'; func_exit 1; }' $signal
-done
-signal=EXIT
-
# Note: The 'eval' silences stderr output in dash.
if (declare -A x && { x[f/2]='foo'; x[f/3]='bar'; eval test '${x[f/2]}' = foo;
}) 2>/dev/null; then
# Zsh 4 and Bash 4 have associative arrays.