bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case insensitivity seems to ignore lower bound of interval


From: Eric Bischoff
Subject: Re: Case insensitivity seems to ignore lower bound of interval
Date: Wed, 4 May 2011 09:43:45 +0200
User-agent: KMail/1.13.6 (Linux/2.6.38-8-generic; KDE/4.6.2; x86_64; ; )

Le vendredi 29 avril 2011 09:55:23, Aharon Robbins a écrit :
> Davide Brini states:
> > You seem to think this is gawk-specific, but in fact any locale-aware
> > tool that uses regular expressions behaves the same (try eg with sed or
> > grep).
> 
> And this too is correct.

It isn't. See the result of tests that followed: sed, awk and grep just don't 
behave the same, at least in the versions shipped with distributions.

> POSIX locales (in my not-so-humble opinion) are
> a total and utter botch.

On that we all agree ;-).

> [[:lower:]], [[:upper:]] and so on exist to mitigate this issue. They are
> not perfect solutions.

Yes.

> > Collation [...]
> 
> Collation has to do with sorting order, and less so with regular expression
> matching.  Gawk doesn't support [[=e=]] which is supposed to match all
> versions of the letter 'e'.

OK, did not know.
 
> I agree, which is why I've clarified the doc and changed the code, but
> again, this is not a gawk-specific issue but a general locale issue.

Have the library writers been contacted? Since the problems seem to rely 
there, wouldn't that the logical thing to do?
 
> > One technical possibility would be to simply use Unicode code positions.
> 
> Unfortunately, no.  Gawk is used in many parts of the world where Unicode
> is not the standard character set (Japan, China, etc.)

I was suggesting to convert internally to Unicode from other character sets 
before doing anything else. I'm not sure this is a good idea though in the 
case of awk. But it's a common technique to work internally in unicode.

Also, Unicode is becoming standard everywhere in the world, replacing all 
older encodings. That includes China and Japan.

> and restricting gawk to just Unicode would not be a good idea. 

That was not what I suggested. Sorry if I wasn't clear.

> If you still disagree, then I'm sorry, there's nothing else I can do
> to help.

I'm sorry I did not understand in the first place you initial message, saying 
it was already solved. Please accept my apologies for that.

-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com
(+33) 3 68 46 00 85
sip:address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]