Re: [bug-grep] Re: grep: -i option not working i cronjobs

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-grep] Re: grep: -i option not working i cronjobs

From:	Elliott Hughes
Subject:	Re: [bug-grep] Re: grep: -i option not working i cronjobs
Date:	Sun, 14 Nov 2004 10:20:24 -0800

i think the kind of thing Aharon was thinking of was something like thecharacter ß (Latin small letter sharp s) in de_DE (as opposed tode_CH), which has no upper-case form, and would be the two *characters*"SS".

Java understands the conversion, but doesn't think they match. i'm notGerman, but that seems wrong to me. (but then, given Swiss German, i'dprobably always want "ss" and "\u00df" to match in free-text searchapplications.)


$ cat > t.java
public class t {
 public static void main(String[] args) {
  String latinSmallLetterSharpS = "\u00df";
  System.out.println(latinSmallLetterSharpS);
  System.out.println(latinSmallLetterSharpS.toUpperCase());

  System.out.println("schliesslich".matches("(?i)SCHLIESSLICH"));
  System.out.println("schlie\u00dflich".matches("(?i)SCHLIESSLICH"));

System.out.println("schlie\u00dflich".matches("(?i)SCHLIE\u00dfLICH"));// You wouldn't write this.

 }
}
$ javac t.java && java t
ß
SS
true
false
true
$

--
http://www.jessies.org/~enh/

On Nov 14, 2004, at 04:09, Aharon Robbins wrote:

Careful here.  As I just recently learned, there are languages where
a lower case character is one byte and the upper case equivalent is a
multibyte character.  (Or vice versa, I don't remember.)  Thus, the

'a' -> '[aA]' solution is fine for ASCII, but doesn't generalize forother

character sets.  Or least not simply.


Having a single-byte character and a multi-byte character in the same
character class works fine here in UTF-8.  Why do you think there
would be problems with this approach?

Tim.


I don't know if there would be problems or if there wouldn't be, but
the code doing this can't be naive and just do

        if (ignoring case) {
                buffer[i++] = '[';
                buffer[i++] = c;
                buffer[i++] = toupper(c);
                buffer[i++] = ']';
        }

It has to be somewhat smarter.  Also, UTF-8 isn't the only multibyte
encoding that GLIBC and thus GNU can handle...

I'm a parochial American and thus find all the multibyte stuff to
be a pain, but that's just me personally. :-)  Gawk still isn't
really multibyte aware.  For example, the length() function returns
bytes, not characters, and I have no idea as to whether index()
really works correctly in multibyte characters.  Similar for substr().
(If anyone here is a guru and wants to help out with these things,
let me know! :-)

Arnold

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14
- Re: [bug-grep] Re: grep: -i option not working i cronjobs, Tim Waugh, 2004/11/14
- Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14
  - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Elliott Hughes <=
    - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Tim Waugh, 2004/11/15
    - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Julian Foad, 2004/11/15
    - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Tim Waugh, 2004/11/15
    - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Elliott Hughes, 2004/11/15
  - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Tim Waugh, 2004/11/15
- Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14
  - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Elliott Hughes, 2004/11/14
- Re: [bug-grep] Re: grep: -i option not working i cronjobs, Aharon Robbins, 2004/11/14
  - Re: [bug-grep] Re: grep: -i option not working i cronjobs, Elliott Hughes, 2004/11/14

Prev by Date: [bug-grep] Re: grep: -i option not working i cronjobs
Next by Date: [bug-grep] Re: [Fwd: [Fwd: grep v2.5.1]]
Previous by thread: Re: [bug-grep] Re: grep: -i option not working i cronjobs
Next by thread: Re: [bug-grep] Re: grep: -i option not working i cronjobs
Index(es):
- Date
- Thread