discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSString lowercaseString


From: Ivan Vučica
Subject: Re: NSString lowercaseString
Date: Wed, 1 Aug 2012 11:49:35 +0200

Which charset is your terminal configured to use on each operating system?

On 1. 8. 2012., at 10:50, "Sebastian Reitenbach" 
<sebastia@l00-bugdead-prods.de> wrote:

> 
> On Wednesday, August 1, 2012 05:16 CEST, Eric Wasylishen 
> <ewasylishen@gmail.com> wrote: 
> 
>> Hi,
>> 
>> A while ago I added code to NSString.m to use ICU for the -compare: and 
>> -rangeOfString: methods, so they're done correctly with respect to unicode 
>> and locales, as well as tests that verify the behaviour matches Cocoa for 
>> the most part.
>> 
>> The -lowercaseString/-uppercaseString methods should probably use 
>> u_strFoldCase if ICU is available.
>> 
>> I'm skimming through the NSString API looking for methods that we should use 
>> ICU for and currently don't (or don't implement), and there are only a 
>> handful:
>> 
>> -decomposedString* and -precomposedString* methods
>> -uppercase/lowercase/capitalized methods
>> -stringByFoldingWithOptions:locale:
>> -localizedStandardCompare:
>> -rangeOfComposedCharacterSequenceAtIndex:
>> -rangeOfComposedCharacterSequencesForRange:
>> -initWithFormat:locale: and friends perhaps? Maybe what we have now is fine 
>> though, I'm not too familiar with it.
>> 
>> I'd be willing to do the case folding ones at some point, for a start. :-)
> 
> I "enhanced" my test program a bit, and compared output when running on Linux 
> and OpenBSD:
> 
> #import <Foundation/Foundation.h>
> 
> 
> int main(int argc, char *argv[]) {
> NSLog(@"Lowercase: %@", [[NSString stringWithString:@"TöÖst"] 
> lowercaseString]);
> 
> }
> 
> running the test program on a Linux box in xterm (opensuse 11.3) without my 
> patch:
> sre@sre:~> LC_CTYPE='de_DE.UTF-8' ./lowercase 
> 2012-08-01 08:49:57.972 lowercase[16574] autorelease called without pool for 
> object (0x72db28) of class GSCInlineString in thread <NSThread: 0x6b0cc8>
> 2012-08-01 08:49:57.974 lowercase[16574] autorelease called without pool for 
> object (0x72dce8) of class GSCInlineString in thread <NSThread: 0x6b0cc8>
> 2012-08-01 08:49:57.974 lowercase[16574] Lowercase: töÃst
> sre@sre:~> LC_CTYPE='en_EN.UTF-8' ./lowercase 
> 2012-08-01 08:50:09.500 lowercase[16584] autorelease called without pool for 
> object (0x72d538) of class GSCInlineString in thread <NSThread: 0x6b06d8>
> 2012-08-01 08:50:09.501 lowercase[16584] autorelease called without pool for 
> object (0x72d6f8) of class GSCInlineString in thread <NSThread: 0x6b06d8>
> 2012-08-01 08:50:09.501 lowercase[16584] Lowercase: töÖst
> 
> logged in from the same Linux box, xterm, to the OpenBSD host I get (with and 
> without my patch):
> $ LC_CTYPE='de_DE.UTF-8' ./lowercase 
> 2012-08-01 10:38:52.850 lowercase[5483] autorelease called without pool for 
> object (0x20c403f88) of class GSUnicodeInlineString in thread <NSThread: 
> 0x20750be08>
> 2012-08-01 10:38:52.851 lowercase[5483] autorelease called without pool for 
> object (0x209c1c5c8) of class GSUnicodeInlineString in thread <NSThread: 
> 0x20750be08>
> 2012-08-01 10:38:52.852 lowercase[5483] Lowercase: tööst
> $ LC_CTYPE='en_EN.UTF-8' ./lowercase 
> 2012-08-01 10:38:46.754 lowercase[32569] autorelease called without pool for 
> object (0x20af26088) of class GSUnicodeInlineString in thread <NSThread: 
> 0x2028f9308>
> 2012-08-01 10:38:46.756 lowercase[32569] autorelease called without pool for 
> object (0x20444f248) of class GSUnicodeInlineString in thread <NSThread: 
> 0x2028f9308>
> 2012-08-01 10:38:46.756 lowercase[32569] Lowercase: t��st
> 
> The weird thing on Linux is that the second Ö is not lowercase, but on 
> OpenBSD it is. Also on Linux its linked against icu4c.
> Even weirder is that the LC_CTYPE, with DE it works on OpenBSD, but not 
> Linux, and with EN the other way around?
> 
> Sebastian
> 
> 
>> 
>> Eric
>> 
>> On Jul 31, 2012, at 3:40 PM, Stefan Bidi <stefanbidi@gmail.com> wrote:
>> 
>>> On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach
>>> <sebastia@l00-bugdead-prods.de> wrote:
>>>> 
>>>> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <theraven@sucs.org> 
>>>> wrote:
>>>> 
>>>>> Are you using GNUstep with or without ICU?  When you say skipped, is it 
>>>>> removed from the destination, or just passed through unmodified?  Is your 
>>>>> locale set to something that recognises letters with umlauts?
>>>> 
>>>> It's with ICU, and I run OGo with
>>>> LC_CTYPE='de_DE.UTF-8'
>>>> so, supposed to recognize Umlauts.
>>>> 
>>>> I had some NSLog in GSString lowercase, and without my patch, it returns 0 
>>>> for an Umlaut, so its not really skipped, but the
>>>> o->_contents.c[i] is set to 0 in the middle of a string :(
>>>> 
>>>> My patch just checks if tolower returned 0, and then just pass the 
>>>> character it cannot handle without doing anything with it.
>>>> 
>>>> following ICU is installed:
>>>> $ pkg_info | grep icu4c
>>>> icu4c-4.8.1.1       International Components for Unicode
>>> 
>>> Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString
>>> class, but it isn't used very often).  I looked into it over a year
>>> ago but decided against implementing something.  The reason was
>>> because I didn't completely understand the code and at that point I
>>> had already started working on CFString, which I could freely break
>>> without anyone noticing.
>>> 
>>> Stef
>>> 
>>>> 
>>>> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that 
>>>> matters.
>>>> 
>>>> Sebastian
>>>> 
>>>> 
>>>>> 
>>>>> David
>>>>> 
>>>>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings 
>>>>>> lowercaseString]
>>>>>> 
>>>>>> when there are Umlauts in the string, then GNUstep just omits the 
>>>>>> character.
>>>>>> I've no idea, whether this is right or wrong actually.
>>>>>> 
>>>>>> With the attached patch below to GSString it does not omit the character 
>>>>>> anymore.
>>>>>> 
>>>>>> 
>>>>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString 
>>>>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o 
>>>>>> lowercase
>>>>>> 
>>>>>> cat lowercase.m
>>>>>> #import <Foundation/Foundation.h>
>>>>>> 
>>>>>> 
>>>>>> int main(int argc, char *argv[]) {
>>>>>>      NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"] 
>>>>>> lowercaseString]);
>>>>>> 
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Does above running the program on a Mac output the ö or omit it from the 
>>>>>> string?
>>>>>> 
>>>>>> does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ?
>>>>>> 
>>>>>> I don't have a Mac, so cannot test myself, maybe also the approach used 
>>>>>> by OGo could be wrong.
>>>>>> At least when reading the Apple docs, then there is nothing said about 
>>>>>> skipped characters,
>>>>>> only that i.e. a ß may change to SS when i.e. using uppercaseString.
>>>>>> Since they mentioned the ß in the documentation, I'd expect the 
>>>>>> lowercaseString to handle other Umlauts too, or is that just plain wrong 
>>>>>> assumption?
>>>>>> 
>>>>>> if someone could hit me with a cluestick please ;)
>>>>>> 
>>>>>> cheers,
>>>>>> Sebastian
>>>>>> 
>>>>>> the patch to not omit Umlauts.
>>>>>> $OpenBSD$
>>>>>> --- Source/GSString.m.orig  Tue Jul 31 18:31:36 2012
>>>>>> +++ Source/GSString.m       Tue Jul 31 18:32:24 2012
>>>>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise.
>>>>>> while (i-- > 0)
>>>>>>   {
>>>>>>     o->_contents.c[i] = tolower(_contents.c[i]);
>>>>>> +      if (o->_contents.c[i] == 0)
>>>>>> +   o->_contents.c[i] = _contents.c[i];
>>>>>>   }
>>>>>> o->_flags.wide = 0;
>>>>>> o->_flags.owned = 1;      // Ignored on dealloc, but means we own buffer
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Discuss-gnustep mailing list
>>>>>> Discuss-gnustep@gnu.org
>>>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>>>>> 
>>>>> --
>>>>> This email complies with ISO 3103
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Discuss-gnustep mailing list
>>>> Discuss-gnustep@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>>> 
>>> _______________________________________________
>>> Discuss-gnustep mailing list
>>> Discuss-gnustep@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>> 
> 
> 
> 
> 
> 
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnustep




reply via email to

[Prev in Thread] Current Thread [Next in Thread]