[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-grep] Re: grep: -i option not working i cronjobs
From: |
Aharon Robbins |
Subject: |
Re: [bug-grep] Re: grep: -i option not working i cronjobs |
Date: |
Sun, 14 Nov 2004 14:09:22 +0200 |
> > Careful here. As I just recently learned, there are languages where
> > a lower case character is one byte and the upper case equivalent is a
> > multibyte character. (Or vice versa, I don't remember.) Thus, the
> > 'a' -> '[aA]' solution is fine for ASCII, but doesn't generalize for other
> > character sets. Or least not simply.
>
> Having a single-byte character and a multi-byte character in the same
> character class works fine here in UTF-8. Why do you think there
> would be problems with this approach?
>
> Tim.
I don't know if there would be problems or if there wouldn't be, but
the code doing this can't be naive and just do
if (ignoring case) {
buffer[i++] = '[';
buffer[i++] = c;
buffer[i++] = toupper(c);
buffer[i++] = ']';
}
It has to be somewhat smarter. Also, UTF-8 isn't the only multibyte
encoding that GLIBC and thus GNU can handle...
I'm a parochial American and thus find all the multibyte stuff to
be a pain, but that's just me personally. :-) Gawk still isn't
really multibyte aware. For example, the length() function returns
bytes, not characters, and I have no idea as to whether index()
really works correctly in multibyte characters. Similar for substr().
(If anyone here is a guru and wants to help out with these things,
let me know! :-)
Arnold
Re: [bug-grep] Re: grep: -i option not working i cronjobs, Tim Waugh, 2004/11/15
Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14
Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14