[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
From: |
Norihiro Tanaka |
Subject: |
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales |
Date: |
Sat, 29 Nov 2014 11:58:48 +0900 |
On Fri, 28 Nov 2014 16:50:29 +0100
Vincent Lefevre <address@hidden> wrote:
> What matters is whether a sequence corresponds to a valid UTF-8
> encoded Unicode character. My patch ensures that pcre_exec is called
> on a string with only such characters, which implies that this is
> also valid UTF-8 for PCRE (whether Unicode validity is also considered
> in valid_utf8() or not). So, there's no valid reason why grep would
> crash under such a condition.
It seems that PCRE treats e.g. following character as invalid. It means
we should not these characters into pcre_exec with PCRE_NO_UTF8_CHECK
option.
0xE0 0xC2 0xFF
0xED 0xA0 0xFF
0xF0 0xBF 0xFF 0xFF
0xF4 0xBF 0xBF 0xBF