[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSString lowercaseString
From: |
Stefan Bidi |
Subject: |
Re: NSString lowercaseString |
Date: |
Tue, 31 Jul 2012 14:40:56 -0500 |
On Tue, Jul 31, 2012 at 12:27 PM, Sebastian Reitenbach
<sebastia@l00-bugdead-prods.de> wrote:
>
> On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <theraven@sucs.org>
> wrote:
>
>> Are you using GNUstep with or without ICU? When you say skipped, is it
>> removed from the destination, or just passed through unmodified? Is your
>> locale set to something that recognises letters with umlauts?
>
> It's with ICU, and I run OGo with
> LC_CTYPE='de_DE.UTF-8'
> so, supposed to recognize Umlauts.
>
> I had some NSLog in GSString lowercase, and without my patch, it returns 0
> for an Umlaut, so its not really skipped, but the
> o->_contents.c[i] is set to 0 in the middle of a string :(
>
> My patch just checks if tolower returned 0, and then just pass the character
> it cannot handle without doing anything with it.
>
> following ICU is installed:
> $ pkg_info | grep icu4c
> icu4c-4.8.1.1 International Components for Unicode
Just FYI, GNUstep doesn't use ICU in NSString (David add a GSICUString
class, but it isn't used very often). I looked into it over a year
ago but decided against implementing something. The reason was
because I didn't completely understand the code and at that point I
had already started working on CFString, which I could freely break
without anyone noticing.
Stef
>
> gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that
> matters.
>
> Sebastian
>
>
>>
>> David
>>
>> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote:
>>
>> > Hi,
>> >
>> > with OGo, I convert a UTF-8 string to lowercase, using [NSStrings
>> > lowercaseString]
>> >
>> > when there are Umlauts in the string, then GNUstep just omits the
>> > character.
>> > I've no idea, whether this is right or wrong actually.
>> >
>> > With the attached patch below to GSString it does not omit the character
>> > anymore.
>> >
>> >
>> > gcc -fgnu-runtime -fconstant-string-class=NSConstantString
>> > -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o
>> > lowercase
>> >
>> > cat lowercase.m
>> > #import <Foundation/Foundation.h>
>> >
>> >
>> > int main(int argc, char *argv[]) {
>> > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"]
>> > lowercaseString]);
>> >
>> > }
>> >
>> >
>> >
>> > Does above running the program on a Mac output the ö or omit it from the
>> > string?
>> >
>> > does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ?
>> >
>> > I don't have a Mac, so cannot test myself, maybe also the approach used by
>> > OGo could be wrong.
>> > At least when reading the Apple docs, then there is nothing said about
>> > skipped characters,
>> > only that i.e. a ß may change to SS when i.e. using uppercaseString.
>> > Since they mentioned the ß in the documentation, I'd expect the
>> > lowercaseString to handle other Umlauts too, or is that just plain wrong
>> > assumption?
>> >
>> > if someone could hit me with a cluestick please ;)
>> >
>> > cheers,
>> > Sebastian
>> >
>> > the patch to not omit Umlauts.
>> > $OpenBSD$
>> > --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012
>> > +++ Source/GSString.m Tue Jul 31 18:32:24 2012
>> > @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise.
>> > while (i-- > 0)
>> > {
>> > o->_contents.c[i] = tolower(_contents.c[i]);
>> > + if (o->_contents.c[i] == 0)
>> > + o->_contents.c[i] = _contents.c[i];
>> > }
>> > o->_flags.wide = 0;
>> > o->_flags.owned = 1; // Ignored on dealloc, but means we own buffer
>> >
>> > _______________________________________________
>> > Discuss-gnustep mailing list
>> > Discuss-gnustep@gnu.org
>> > https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>>
>> --
>> This email complies with ISO 3103
>>
>
>
>
>
>
> _______________________________________________
> Discuss-gnustep mailing list
> Discuss-gnustep@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnustep