[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case insensitivity seems to ignore lower bound of interval

From: Aharon Robbins
Subject: Re: Case insensitivity seems to ignore lower bound of interval
Date: Wed, 04 May 2011 22:49:41 +0300
User-agent: Heirloom mailx 12.4 7/29/08


> > I agree, which is why I've clarified the doc and changed the code, but
> > again, this is not a gawk-specific issue but a general locale issue.
> Have the library writers been contacted? Since the problems seem to rely 
> there, wouldn't that [be] the logical thing to do?

Yes it would be.  I am 110% certain that the answer will be that the lib
implements POSIX.  You can try to open a bug on glibc though.

> > > One technical possibility would be to simply use Unicode code positions.
> > 
> > Unfortunately, no.  Gawk is used in many parts of the world where Unicode
> > is not the standard character set (Japan, China, etc.)
> I was suggesting to convert internally to Unicode from other character sets 
> before doing anything else. I'm not sure this is a good idea though in the 
> case of awk. But it's a common technique to work internally in unicode.

Truth to tell, it's a lot of work to do things this way, since you have
to convert on input and again on output, and do everything in wide characters.

Worse - the two regexp engines that gawk uses don't provide interfaces
for working with wide characters!  That is the biggest problem.

Essentially, while it's a good idea in theory, for gawk it wouldn't work.

> Also, Unicode is becoming standard everywhere in the world, replacing all 
> older encodings. That includes China and Japan.

But it hasn't happened yet.  That's why I said that in maybe 10 years
I could work just with Unicode.

> I'm sorry I did not understand in the first place you initial message, saying 
> it was already solved. Please accept my apologies for that.

No problem at all.  Reasonable conversations with understanding users
are quite pleasant, actually.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]