grep branch, master, updated. v2.15-15-g178ed7c

grep-commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.15-15-g178ed7c

From:	Jim Meyering
Subject:	grep branch, master, updated. v2.15-15-g178ed7c
Date:	Sat, 21 Dec 2013 18:58:54 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  178ed7cc324bc2000c19a3f7a4be649dfa99b44a (commit)
      from  1a8b1b370eace41be892e9fef041f36b72baeefb (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=178ed7cc324bc2000c19a3f7a4be649dfa99b44a


commit 178ed7cc324bc2000c19a3f7a4be649dfa99b44a
Author: Santiago Ruano RincÃ³n <address@hidden>
Date:   Fri Dec 13 07:53:37 2013 -0800

    pcre: tell grep -P to relax its stance on invalid multibyte chars
    
    Do not exit-2 for invalid UTF-8 characters.  Just prior to this
    change, this command would match no lines and fail like this:
      $ printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 grep -P j|cat -A; echo $?
      grep: invalid UTF-8 byte sequence in input
      2
    After this change, the same command matches both lines, and succeeds:
      jM-^B$
      j$
      0
    * src/pcresearch.c (Pcompile): Use PCRE_NO_UTF8_CHECK, too, and
    add a comment.
    * tests/pcre-utf8: Add a test and a comment.
    This change did not work with Debian unstable pcre-8.31-2
    or with some 8.33 and 8.34-based versions, but does work with
    Fedora 20's 8.33 and with a built-from-latest source library.
    Based on a patch by Santiago Ruano RincÃ³n.
    See http://bugs.gnu.org/15758/

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 7e81a31..664070d 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -62,7 +62,11 @@ Pcompile (char const *pattern, size_t size)
 
 # if defined HAVE_LANGINFO_CODESET
   if (STREQ (nl_langinfo (CODESET), "UTF-8"))
-    flags |= PCRE_UTF8;
+    {
+      /* Enable PCRE's UTF-8 matching, but disable the check that would
+         make an invalid byte seqence *in the input* trigger a failure.   */
+      flags |= PCRE_UTF8 | PCRE_NO_UTF8_CHECK;
+    }
 # endif
 
   /* FIXME: Remove these restrictions.  */
diff --git a/tests/pcre-utf8 b/tests/pcre-utf8
index b8228d5..a3b9390 100755
--- a/tests/pcre-utf8
+++ b/tests/pcre-utf8
@@ -19,9 +19,15 @@ echo '$' | LC_ALL=en_US.UTF-8 grep -qP '\p{S}' \
 euro='\342\202\254 euro'
 printf "$euro\\n" > in || framework_failure_
 
+# The euro sign has the unicode "Symbol" property, so this must match:
 LC_ALL=en_US.UTF-8 grep -P '^\p{S}' in > out || fail=1
 compare in out || fail=1
 
+# This RE must *not* match in the C locale, because the first
+# byte is not a "Symbol".
+LC_ALL=C grep -P '^\p{S}' in > out && fail=1
+compare /dev/null out || fail=1
+
 LC_ALL=en_US.UTF-8 grep -P '^. euro$' in > out2 || fail=1
 compare in out2 || fail=1
 

-----------------------------------------------------------------------

Summary of changes:
 src/pcresearch.c |    6 +++++-
 tests/pcre-utf8  |    6 ++++++
 2 files changed, 11 insertions(+), 1 deletions(-)


hooks/post-receive
-- 
grep

[Prev in Thread]

Current Thread

[Next in Thread]

grep branch, master, updated. v2.15-15-g178ed7c, Jim Meyering <=

Prev by Date: grep branch, master, updated. v2.15-14-g1a8b1b3
Next by Date: grep branch, master, updated. v2.15-16-g0a0764b
Previous by thread: grep branch, master, updated. v2.15-14-g1a8b1b3
Next by thread: grep branch, master, updated. v2.15-16-g0a0764b
Index(es):
- Date
- Thread