Re: [bug-grep] Re: grep: -i option not working i cronjobs

From: Aharon Robbins
Subject: Re: [bug-grep] Re: grep: -i option not working i cronjobs
Date: Sun, 14 Nov 2004 14:09:22 +0200

> > Careful here.  As I just recently learned, there are languages where
> > a lower case character is one byte and the upper case equivalent is a
> > multibyte character.  (Or vice versa, I don't remember.)  Thus, the
> > 'a' -> '[aA]' solution is fine for ASCII, but doesn't generalize for other
> > character sets.  Or least not simply.
> Having a single-byte character and a multi-byte character in the same
> character class works fine here in UTF-8.  Why do you think there
> would be problems with this approach?
> Tim.

I don't know if there would be problems or if there wouldn't be, but
the code doing this can't be naive and just do

        if (ignoring case) {
                buffer[i++] = '[';
                buffer[i++] = c;
                buffer[i++] = toupper(c);
                buffer[i++] = ']';

It has to be somewhat smarter.  Also, UTF-8 isn't the only multibyte
encoding that GLIBC and thus GNU can handle...

I'm a parochial American and thus find all the multibyte stuff to
be a pain, but that's just me personally. :-)  Gawk still isn't
really multibyte aware.  For example, the length() function returns
bytes, not characters, and I have no idea as to whether index()
really works correctly in multibyte characters.  Similar for substr().
(If anyone here is a guru and wants to help out with these things,
let me know! :-)


