|
From: | Elliott Hughes |
Subject: | Re: [bug-grep] Re: grep: -i option not working i cronjobs |
Date: | Mon, 15 Nov 2004 09:20:02 -0800 |
On Nov 15, 2004, at 06:27, Tim Waugh wrote:
On Mon, Nov 15, 2004 at 12:56:38PM +0000, Julian Foad wrote:Hold on. Do you think converting to same-case would achieve that one-versus-two-character match? I don't.Hmm. Does the locale interface actually support multichar transliteration? In other words, does the de_DE locale data have anything to say about this special SS character? For me at least, in de_DE.UTF-8, "\303\237" is both upper-case and lower-case (towlower() and towupper() give the same result).
that's why Java has String.to(Lower|Upper)Case to be used in preference to the equivalent methods in Character. but, as i said, Java's regular expression code doesn't handle this case by itself.
even String.equalsIgnoreCase works character-by-character. (and yet goes to the trouble of comparing the characters, the upper-cased characters, and finally the lower-cased characters.)
$ cat > t.java public class t { public static void main(String[] args) {System.out.println("schlie\u00dflich".equalsIgnoreCase("SCHIESSLICH"));
} } $ javac t.java && java t false $i don't have my copy of the Rechtschreibung to see if it says anything about upper/lower equivalence and if so, what it says.
-- http://www.jessies.org/~enh/
[Prev in Thread] | Current Thread | [Next in Thread] |