[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Case insensitivity seems to ignore lower bound of interval
From: |
Aharon Robbins |
Subject: |
Re: Case insensitivity seems to ignore lower bound of interval |
Date: |
Wed, 04 May 2011 22:49:41 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hi.
> > I agree, which is why I've clarified the doc and changed the code, but
> > again, this is not a gawk-specific issue but a general locale issue.
>
> Have the library writers been contacted? Since the problems seem to rely
> there, wouldn't that [be] the logical thing to do?
Yes it would be. I am 110% certain that the answer will be that the lib
implements POSIX. You can try to open a bug on glibc though.
> > > One technical possibility would be to simply use Unicode code positions.
> >
> > Unfortunately, no. Gawk is used in many parts of the world where Unicode
> > is not the standard character set (Japan, China, etc.)
>
> I was suggesting to convert internally to Unicode from other character sets
> before doing anything else. I'm not sure this is a good idea though in the
> case of awk. But it's a common technique to work internally in unicode.
Truth to tell, it's a lot of work to do things this way, since you have
to convert on input and again on output, and do everything in wide characters.
Worse - the two regexp engines that gawk uses don't provide interfaces
for working with wide characters! That is the biggest problem.
Essentially, while it's a good idea in theory, for gawk it wouldn't work.
> Also, Unicode is becoming standard everywhere in the world, replacing all
> older encodings. That includes China and Japan.
But it hasn't happened yet. That's why I said that in maybe 10 years
I could work just with Unicode.
> I'm sorry I did not understand in the first place you initial message, saying
> it was already solved. Please accept my apologies for that.
No problem at all. Reasonable conversations with understanding users
are quite pleasant, actually.
Thanks,
Arnold