discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSString lowercaseString


From: Thomas Gamper
Subject: Re: NSString lowercaseString
Date: Wed, 01 Aug 2012 16:42:48 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0

Am 01.08.2012 16:28, schrieb Thomas Gamper:
Am 01.08.2012 13:20, schrieb Sebastian Reitenbach:

2012-08-01 13:16:59.694 lowercase2[22437] t 116
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 182
2012-08-01 13:16:59.694 lowercase2[22437] � 195
2012-08-01 13:16:59.694 lowercase2[22437] � 150
2012-08-01 13:16:59.694 lowercase2[22437] s 115
2012-08-01 13:16:59.694 lowercase2[22437] t 116

Well, this is correct UTF-8 representing "töÖst". But I think it should be "tööst" in order to represent the correct lowercase string.

2012-08-01 13:17:12.792 lowercase2[22441] t 116
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441] ¶ 182
2012-08-01 13:17:12.792 lowercase2[22441] Ã 195
2012-08-01 13:17:12.792 lowercase2[22441]  150
2012-08-01 13:17:12.792 lowercase2[22441] s 115
2012-08-01 13:17:12.792 lowercase2[22441] t 116
Same here, correct UTF-8 representing "töÖst". Due to the locale de_DE setting, chars above 127 are displayed not the same as with en_EN locale. Same issue with the conversion to lowercase not working.

2012-08-01 13:18:25.502 lowercase2[5619] t 116
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] � 246
2012-08-01 13:18:25.502 lowercase2[5619] s 115
2012-08-01 13:18:25.502 lowercase2[5619] t 116
Okay, here it gets tricky. These are not UTF-8, but UTF-16 values representing "tööst", so the conversion to lowercase string is actually correct. I do wonder why the string is not UTF-8 coded, though.

$ LC_CTYPE='de_DE.UTF-8' ./lowercase2
2012-08-01 13:18:32.743 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] Lowercase: tööst
2012-08-01 13:18:32.744 lowercase2[16814] t 116
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] ö 246
2012-08-01 13:18:32.745 lowercase2[16814] s 115
2012-08-01 13:18:32.745 lowercase2[16814] t 116
Now I am really confused, the terminal takes an UTF-16 value (246) and is able to display the character it represents correctly.
I am an idiot, Latin-1 uses 246 for 'ö', too.

Cheers,
TOM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]