discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSString lowercaseString


From: Eric Wasylishen
Subject: Re: NSString lowercaseString
Date: Tue, 31 Jul 2012 23:16:42 -0400

Hi,

A while ago I added code to NSString.m to use ICU for the -compare: and 
-rangeOfString: methods, so they're done correctly with respect to unicode and 
locales, as well as tests that verify the behaviour matches Cocoa for the most 
part.

The -lowercaseString/-uppercaseString methods should probably use u_strFoldCase 
if ICU is available.

I'm skimming through the NSString API looking for methods that we should use 
ICU for and currently don't (or don't implement), and there are only a handful:

-decomposedString* and -precomposedString* methods
-uppercase/lowercase/capitalized methods
-stringByFoldingWithOptions:locale:
-localizedStandardCompare:
-rangeOfComposedCharacterSequenceAtIndex:
-rangeOfComposedCharacterSequencesForRange:
-initWithFormat:locale: and friends perhaps? Maybe what we have now is fine 
though, I'm not too familiar with it.

I'd be willing to do the case folding ones at some point, for a start. :-)

Eric

On Jul 31, 2012, at 3:40 PM, Stefan Bidi <stefanbidi@gmail.com> wrote:

> On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach
> <sebastia@l00-bugdead-prods.de> wrote:
>> 
>> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <theraven@sucs.org> 
>> wrote:
>> 
>>> Are you using GNUstep with or without ICU?  When you say skipped, is it 
>>> removed from the destination, or just passed through unmodified?  Is your 
>>> locale set to something that recognises letters with umlauts?
>> 
>> It's with ICU, and I run OGo with
>> LC_CTYPE='de_DE.UTF-8'
>> so, supposed to recognize Umlauts.
>> 
>> I had some NSLog in GSString lowercase, and without my patch, it returns 0 
>> for an Umlaut, so its not really skipped, but the
>> o->_contents.c[i] is set to 0 in the middle of a string :(
>> 
>> My patch just checks if tolower returned 0, and then just pass the character 
>> it cannot handle without doing anything with it.
>> 
>> following ICU is installed:
>> $ pkg_info | grep icu4c
>> icu4c-4.8.1.1       International Components for Unicode
> 
> Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString
> class, but it isn't used very often).  I looked into it over a year
> ago but decided against implementing something.  The reason was
> because I didn't completely understand the code and at that point I
> had already started working on CFString, which I could freely break
> without anyone noticing.
> 
> Stef
> 
>> 
>> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that 
>> matters.
>> 
>> Sebastian
>> 
>> 
>>> 
>>> David
>>> 
>>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote:
>>> 
>>>> Hi,
>>>> 
>>>> with OGo, I convert a UTF-8 string to lowercase, using [NSStrings 
>>>> lowercaseString]
>>>> 
>>>> when there are Umlauts in the string, then GNUstep just omits the 
>>>> character.
>>>> I've no idea, whether this is right or wrong actually.
>>>> 
>>>> With the attached patch below to GSString it does not omit the character 
>>>> anymore.
>>>> 
>>>> 
>>>> gcc -fgnu-runtime -fconstant-string-class=NSConstantString 
>>>> -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o 
>>>> lowercase
>>>> 
>>>> cat lowercase.m
>>>> #import <Foundation/Foundation.h>
>>>> 
>>>> 
>>>> int main(int argc, char *argv[]) {
>>>>       NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"] 
>>>> lowercaseString]);
>>>> 
>>>> }
>>>> 
>>>> 
>>>> 
>>>> Does above running the program on a Mac output the ö or omit it from the 
>>>> string?
>>>> 
>>>> does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ?
>>>> 
>>>> I don't have a Mac, so cannot test myself, maybe also the approach used by 
>>>> OGo could be wrong.
>>>> At least when reading the Apple docs, then there is nothing said about 
>>>> skipped characters,
>>>> only that i.e. a ß may change to SS when i.e. using uppercaseString.
>>>> Since they mentioned the ß in the documentation, I'd expect the 
>>>> lowercaseString to handle other Umlauts too, or is that just plain wrong 
>>>> assumption?
>>>> 
>>>> if someone could hit me with a cluestick please ;)
>>>> 
>>>> cheers,
>>>> Sebastian
>>>> 
>>>> the patch to not omit Umlauts.
>>>> $OpenBSD$
>>>> --- Source/GSString.m.orig  Tue Jul 31 18:31:36 2012
>>>> +++ Source/GSString.m       Tue Jul 31 18:32:24 2012
>>>> @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise.
>>>>  while (i-- > 0)
>>>>    {
>>>>      o->_contents.c[i] = tolower(_contents.c[i]);
>>>> +      if (o->_contents.c[i] == 0)
>>>> +   o->_contents.c[i] = _contents.c[i];
>>>>    }
>>>>  o->_flags.wide = 0;
>>>>  o->_flags.owned = 1;      // Ignored on dealloc, but means we own buffer
>>>> 
>>>> _______________________________________________
>>>> Discuss-gnustep mailing list
>>>> Discuss-gnustep@gnu.org
>>>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>>> 
>>> --
>>> This email complies with ISO 3103
>>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Discuss-gnustep mailing list
>> Discuss-gnustep@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnustep
> 
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnustep




reply via email to

[Prev in Thread] Current Thread [Next in Thread]