[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.25-98-g7c4c694

From: Jim Meyering
Subject: grep branch, master, updated. v2.25-98-g7c4c694
Date: Sun, 25 Sep 2016 00:24:08 +0000 (UTC)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  7c4c69400c6ab456f1fdea4d0960c5a49cb040eb (commit)
      from  0b7fae5850db49a4238511d10a7f0ebb7067abaa (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------

commit 7c4c69400c6ab456f1fdea4d0960c5a49cb040eb
Author: Jim Meyering <address@hidden>
Date:   Sat Sep 24 15:56:08 2016 -0700

    tests: revamp multibyte-white-space test to be more permissive
    This test elicits too many failures.  Whether a system has accurate
    unicode "whitespace" attributes should not influence whether grep's
    test suite passes.  In many cases, now you will see a warning that
    some multibyte characters do not pass whitespace-related tests, but
    this test no longer fails.  However, if you run this test on a modern
    enough system, it does require that \s and \S do work properly with
    most of the listed characters.
    * tests/multibyte-white-space: Confirm that Fedora 24's locale
    tables still declare those four Unicode code points *not* whitespace.
    Honor a new column telling how to handle failure.  Provide more
    information in each diagnostic.
    Reported by Nelson H. F. Beebe.

diff --git a/tests/multibyte-white-space b/tests/multibyte-white-space
index c2a493b..24b14cb 100755
--- a/tests/multibyte-white-space
+++ b/tests/multibyte-white-space
@@ -18,51 +18,74 @@ export LC_ALL
 # with the Unicode WSpace=Y character property,
 #, but that
 # would currently cause distracting failures everywhere I've tried.
+# Instead, I've listed each with an indicator column, telling what
+# this test should do if the system's locale/tools produce the
+# wrong answer.
-# FIXME: including any the following in the list below would
-# make this test fail on Fedora 19/glibc-2.17-18.fc19.
-# Restore them to the list once it is fixed.
-U+00A0 NO-BREAK SPACE:            c2 a0
-U+2007 FIGURE SPACE:              e2 80 87
-U+200B ZERO WIDTH SPACE:          e2 80 8b
-U+202F NARROW NO-BREAK SPACE:     e2 80 af
-U+000A Line feed:                 0a
-U+0085 Next line:                 85
+# The values in that column:
+# X required on all systems (fail if \s or \S fail to work as expected)
+# x required on "modern enough" systems
+# O optional: \s or \S misbehavior elicits a warning, but never failure
-utf8_space_characters=$(sed 's/.*://;s/  */\\x/g' <<\EOF
-U+0009 Horizontal Tab:            09
-U+000B Vertical Tab:              0b
-U+000C Form feed:                 0c
-U+000D Carriage return:           0d
-U+0020 SPACE:                     20
-U+1680 OGHAM SPACE MARK:          e1 9a 80
-U+2000 EN QUAD:                   e2 80 80
-U+2001 EM QUAD:                   e2 80 81
-U+2002 EN SPACE:                  e2 80 82
-U+2003 EM SPACE:                  e2 80 83
-U+2004 THREE-PER-EM SPACE:        e2 80 84
-U+2005 FOUR-PER-EM SPACE:         e2 80 85
-U+2006 SIX-PER-EM SPACE:          e2 80 86
-U+2008 PUNCTUATION SPACE:         e2 80 88
-U+2009 THIN SPACE:                e2 80 89
-U+200A HAIR SPACE:                e2 80 8a
-U+3000 IDEOGRAPHIC SPACE:         e3 80 80
+utf8_space_characters=$(sed 's/.*: *//;s/  */\\x/g' <<\EOF
+U+0009 Horizontal Tab:            X 09
+U+000A Line feed:                 O 0a
+U+000B Vertical Tab:              X 0b
+U+000C Form feed:                 X 0c
+U+000D Carriage return:           X 0d
+U+0020 SPACE:                     X 20
+U+0085 Next line:                 O 85
+U+00A0 NO-BREAK SPACE:            O c2 a0
+U+1680 OGHAM SPACE MARK:          x e1 9a 80
+U+2000 EN QUAD:                   x e2 80 80
+U+2001 EM QUAD:                   x e2 80 81
+U+2002 EN SPACE:                  x e2 80 82
+U+2003 EM SPACE:                  x e2 80 83
+U+2004 THREE-PER-EM SPACE:        x e2 80 84
+U+2005 FOUR-PER-EM SPACE:         x e2 80 85
+U+2006 SIX-PER-EM SPACE:          x e2 80 86
+U+2007 FIGURE SPACE:              O e2 80 87
+U+2008 PUNCTUATION SPACE:         x e2 80 88
+U+2009 THIN SPACE:                x e2 80 89
+U+200A HAIR SPACE:                x e2 80 8a
+U+200B ZERO WIDTH SPACE:          O e2 80 8b
+U+202F NARROW NO-BREAK SPACE:     O e2 80 af
+U+3000 IDEOGRAPHIC SPACE:         x e3 80 80
+# On systems that are not "modern enough," simply warn when an "x"-marked
+# character is not classified as white space.  Too many systems
+# have inadequate UTF-8 tables in this respect, and that lack should not
+# discourage/confuse those who consider whether to install grep.
+# As for what constitutes "modern enough", I've arbitrarily started
+# with "Fedora 20 or newer".  Tested additions welcome.
+grep -iE 'fedora release [2-9][0-9]+\b' /etc/redhat-release >/dev/null 2>&1 \
+  && modern_enough=1
 for i in $utf8_space_characters; do
+  eval 'fail() { fail=1; }'
+  m=ERROR
+  case $i in
+      X*) ;;
+      x*) test $modern_enough = 1 || { eval 'fail() { :; }'; m=warning; } ;;
+      O*) m=warning; eval 'fail() { :; }' ;;
+      *) warn_ "unexpected prefix: $i"; exit 1 ;;
+  esac
+  # Strip the prefix byte.
+  i=${i#?}
   hex_printf_ "$i" | grep -q '^\s$' \
-      || { warn_ "$i FAILED to match \\s"; fail=1; }
+      || { warn_ " $m: \\s failed to match $i in the $LC_ALL locale"; fail; }
   hex_printf_ "$i" | grep -q '\S'
   test $? = 1 \
-      || { warn_ "$i vs. \\S FAILED"; fail=1; }
+      || { warn_ " $m: \\S mistakenly matched $i in the $LC_ALL locale"; fail; 


Summary of changes:
 tests/multibyte-white-space |   91 +++++++++++++++++++++++++++----------------
 1 file changed, 57 insertions(+), 34 deletions(-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]