[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSString lowercaseString
From: |
Eric Wasylishen |
Subject: |
Re: NSString lowercaseString |
Date: |
Tue, 31 Jul 2012 23:16:42 -0400 |
Hi,
A while ago I added code to NSString.m to use ICU for the -compare: and
-rangeOfString: methods, so they're done correctly with respect to unicode and
locales, as well as tests that verify the behaviour matches Cocoa for the most
part.
The -lowercaseString/-uppercaseString methods should probably use u_strFoldCase
if ICU is available.
I'm skimming through the NSString API looking for methods that we should use
ICU for and currently don't (or don't implement), and there are only a handful:
-decomposedString* and -precomposedString* methods
-uppercase/lowercase/capitalized methods
-stringByFoldingWithOptions:locale:
-localizedStandardCompare:
-rangeOfComposedCharacterSequenceAtIndex:
-rangeOfComposedCharacterSequencesForRange:
-initWithFormat:locale: and friends perhaps? Maybe what we have now is fine
though, I'm not too familiar with it.
I'd be willing to do the case folding ones at some point, for a start. :-)
Eric
On Jul 31, 2012, at 3:40 PM, Stefan Bidi <stefanbidi@gmail.com> wrote:
> On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach
> <sebastia@l00-bugdead-prods.de> wrote:
>>
>> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <theraven@sucs.org>
>> wrote:
>>
>>> Are you using GNUstep with or without ICU? When you say skipped, is it
>>> removed from the destination, or just passed through unmodified? Is your
>>> locale set to something that recognises letters with umlauts?
>>
>> It's with ICU, and I run OGo with
>> LC_CTYPE='de_DE.UTF-8'
>> so, supposed to recognize Umlauts.
>>
>> I had some NSLog in GSString lowercase, and without my patch, it returns 0
>> for an Umlaut, so its not really skipped, but the
>> o->_contents.c[i] is set to 0 in the middle of a string :(
>>
>> My patch just checks if tolower returned 0, and then just pass the character
>> it cannot handle without doing anything with it.
>>
>> following ICU is installed:
>> $ pkg_info | grep icu4c
>> icu4c-4.8.1.1 International Components for Unicode
>
> Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString
> class, but it isn't used very often). I looked into it over a year
> ago but decided against implementing something. The reason was
> because I didn't completely understand the code and at that point I
> had already started working on CFString, which I could freely break
> without anyone noticing.
>
> Stef
>
>>
>> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that
>> matters.
>>
>> Sebastian
>>
>>
>>>
>>> David
>>>
>>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote:
>>>
>>>> Hi,
>>>>
>>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings
>>>> lowercaseString]
>>>>
>>>> when there are Umlauts in the string, then GNUstep just omits the
>>>> character.
>>>> I've no idea, whether this is right or wrong actually.
>>>>
>>>> With the attached patch below to GSString it does not omit the character
>>>> anymore.
>>>>
>>>>
>>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString
>>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o
>>>> lowercase
>>>>
>>>> cat lowercase.m
>>>> #import <Foundation/Foundation.h>
>>>>
>>>>
>>>> int main(int argc, char *argv[]) {
>>>> NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"]
>>>> lowercaseString]);
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> Does above running the program on a Mac output the ö or omit it from the
>>>> string?
>>>>
>>>> does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ?
>>>>
>>>> I don't have a Mac, so cannot test myself, maybe also the approach used by
>>>> OGo could be wrong.
>>>> At least when reading the Apple docs, then there is nothing said about
>>>> skipped characters,
>>>> only that i.e. a ß may change to SS when i.e. using uppercaseString.
>>>> Since they mentioned the ß in the documentation, I'd expect the
>>>> lowercaseString to handle other Umlauts too, or is that just plain wrong
>>>> assumption?
>>>>
>>>> if someone could hit me with a cluestick please ;)
>>>>
>>>> cheers,
>>>> Sebastian
>>>>
>>>> the patch to not omit Umlauts.
>>>> $OpenBSD$
>>>> --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012
>>>> +++ Source/GSString.m Tue Jul 31 18:32:24 2012
>>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise.
>>>> while (i-- > 0)
>>>> {
>>>> o->_contents.c[i] = tolower(_contents.c[i]);
>>>> + if (o->_contents.c[i] == 0)
>>>> + o->_contents.c[i] = _contents.c[i];
>>>> }
>>>> o->_flags.wide = 0;
>>>> o->_flags.owned = 1; // Ignored on dealloc, but means we own buffer
>>>>
>>>> _______________________________________________
>>>> Discuss-gnustep mailing list
>>>> Discuss-gnustep@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>>>
>>> --
>>> This email complies with ISO 3103
>>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Discuss-gnustep mailing list
>> Discuss-gnustep@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnustep