bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales


From: Norihiro Tanaka
Subject: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Date: Sat, 29 Nov 2014 11:58:48 +0900

On Fri, 28 Nov 2014 16:50:29 +0100
Vincent Lefevre <address@hidden> wrote:
> What matters is whether a sequence corresponds to a valid UTF-8
> encoded Unicode character. My patch ensures that pcre_exec is called
> on a string with only such characters, which implies that this is
> also valid UTF-8 for PCRE (whether Unicode validity is also considered
> in valid_utf8() or not). So, there's no valid reason why grep would
> crash under such a condition.

It seems that PCRE treats e.g. following character as invalid.  It means
we should not   these characters into pcre_exec with PCRE_NO_UTF8_CHECK
option.

  0xE0 0xC2 0xFF
  0xED 0xA0 0xFF
  0xF0 0xBF 0xFF 0xFF
  0xF4 0xBF 0xBF 0xBF







reply via email to

[Prev in Thread] Current Thread [Next in Thread]