[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] [bug-gawk] GNU grep, awk, sed: support \u and \U for un

From: Paul Eggert
Subject: Re: [Grep-devel] [bug-gawk] GNU grep, awk, sed: support \u and \U for unicode
Date: Thu, 19 Jan 2017 18:48:59 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Assaf Gordon wrote:
Currently, escape sequences are parsed and converted before
being sent to re/dfa.
Thus, '[\u0041]' is equivalent to '[A]'

POSIX requires [\u0041] to be equivalent to [u0041\], that is, it matches any of the characters '\', 'u', '0', '4', and '1'. This is true for grep, sed, and most other utilities that use regular expressions. (awk is an exception.) So except for awk, we can't simply translate \u escapes everywhere. At best we could translate them only if not POSIXLY_CORRECT.

On another topic, if we can't implement \N escapes in general then I wouldn't bother with implementing only \N{U+nnnn}.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]