bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character ranges in regular expressions


From: Aharon Robbins
Subject: Re: character ranges in regular expressions
Date: Mon, 04 Oct 2010 22:43:57 +0200
User-agent: Heirloom mailx 12.4 7/29/08

Sorry for chiming in on this rather late...

> Date: Fri, 24 Sep 2010 16:27:53 -0600
> From: Eric Blake <address@hidden>
> To: Bruno Haible <address@hidden>
> Cc: Paolo Bonzini <address@hidden>, Paul Eggert <address@hidden>,
>         address@hidden, Jim Meyering <address@hidden>
> Subject: Re: character ranges in regular expressions
>
> On 09/24/2010 03:52 PM, Bruno Haible wrote:
> >
> > 1) Is there an agreement of what the result should be? Jim seems to prefer 
> > to
> > extrapolate the result of the "C" locale, i.e. 26.
>
> As do I.
>
> > For other people, the locale
> > dependent behaviour is useful, that is, 51 is desired.
>
> Which is why my proposal is that glibc consider:
>
> [A-Z] => match C locale; 26 letters, regardless of locale
> [[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things 
> with collation symbols (26 letters in POSIX local, 51 or even more in 
> other locales, since accented characters might be included in the 
> collation range), so that we aren't completely losing CEO behavior (if 
> someone seriously has a reason to use it)
> [[:upper:]] => per POSIX rules in all locales

This would be great.  In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]