Re: character ranges in regular expressions

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character ranges in regular expressions

From:	Aharon Robbins
Subject:	Re: character ranges in regular expressions
Date:	Mon, 04 Oct 2010 22:43:57 +0200
User-agent:	Heirloom mailx 12.4 7/29/08

Sorry for chiming in on this rather late...

> Date: Fri, 24 Sep 2010 16:27:53 -0600
> From: Eric Blake <address@hidden>
> To: Bruno Haible <address@hidden>
> Cc: Paolo Bonzini <address@hidden>, Paul Eggert <address@hidden>,
>         address@hidden, Jim Meyering <address@hidden>
> Subject: Re: character ranges in regular expressions
>
> On 09/24/2010 03:52 PM, Bruno Haible wrote:
> >
> > 1) Is there an agreement of what the result should be? Jim seems to prefer 
> > to
> > extrapolate the result of the "C" locale, i.e. 26.
>
> As do I.
>
> > For other people, the locale
> > dependent behaviour is useful, that is, 51 is desired.
>
> Which is why my proposal is that glibc consider:
>
> [A-Z] => match C locale; 26 letters, regardless of locale
> [[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things 
> with collation symbols (26 letters in POSIX local, 51 or even more in 
> other locales, since accented characters might be included in the 
> collation range), so that we aren't completely losing CEO behavior (if 
> someone seriously has a reason to use it)
> [[:upper:]] => per POSIX rules in all locales

This would be great.  In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!

Thanks,

Arnold

[Prev in Thread]

Current Thread

[Next in Thread]

Re: character ranges in regular expressions, Aharon Robbins <=
- Re: character ranges in regular expressions, Eric Blake, 2010/10/04
  - Re: character ranges in regular expressions, Paolo Bonzini, 2010/10/05

Prev by Date: Re: [PATCH] mbsstr: fix warnings reported by gcc -Wcast-align on ARM
Next by Date: Re: character ranges in regular expressions
Previous by thread: Re: [PATCH] mbsstr: fix warnings reported by gcc -Wcast-align on ARM
Next by thread: Re: character ranges in regular expressions
Index(es):
- Date
- Thread