[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSString lowercaseString
From: |
Sebastian Reitenbach |
Subject: |
Re: NSString lowercaseString |
Date: |
Tue, 31 Jul 2012 19:27:59 +0200 |
User-agent: |
SOGoMail 1.3.17 |
On Tuesday, July 31, 2012 19:06 CEST, David Chisnall <theraven@sucs.org> wrote:
> Are you using GNUstep with or without ICU? When you say skipped, is it
> removed from the destination, or just passed through unmodified? Is your
> locale set to something that recognises letters with umlauts?
It's with ICU, and I run OGo with
LC_CTYPE='de_DE.UTF-8'
so, supposed to recognize Umlauts.
I had some NSLog in GSString lowercase, and without my patch, it returns 0 for
an Umlaut, so its not really skipped, but the
o->_contents.c[i] is set to 0 in the middle of a string :(
My patch just checks if tolower returned 0, and then just pass the character it
cannot handle without doing anything with it.
following ICU is installed:
$ pkg_info | grep icu4c
icu4c-4.8.1.1 International Components for Unicode
gnustep is from the latest releases, using libobjc from gcc 4.2.1, if that
matters.
Sebastian
>
> David
>
> On 31 Jul 2012, at 18:02, Sebastian Reitenbach wrote:
>
> > Hi,
> >
> > with OGo, I convert a UTF-8 string to lowercase, using [NSStrings
> > lowercaseString]
> >
> > when there are Umlauts in the string, then GNUstep just omits the character.
> > I've no idea, whether this is right or wrong actually.
> >
> > With the attached patch below to GSString it does not omit the character
> > anymore.
> >
> >
> > gcc -fgnu-runtime -fconstant-string-class=NSConstantString
> > -I/usr/local/include -L/usr/local/lib -l gnustep-base lowercase.m -o
> > lowercase
> >
> > cat lowercase.m
> > #import <Foundation/Foundation.h>
> >
> >
> > int main(int argc, char *argv[]) {
> > NSLog(@"Lowercase: %@", [[NSString stringWithString:@"Töst"]
> > lowercaseString]);
> >
> > }
> >
> >
> >
> > Does above running the program on a Mac output the ö or omit it from the
> > string?
> >
> > does it change when running with LC_CTYPE="C" or LC_CTYPE='de_DE.UTF-8' ?
> >
> > I don't have a Mac, so cannot test myself, maybe also the approach used by
> > OGo could be wrong.
> > At least when reading the Apple docs, then there is nothing said about
> > skipped characters,
> > only that i.e. a ß may change to SS when i.e. using uppercaseString.
> > Since they mentioned the ß in the documentation, I'd expect the
> > lowercaseString to handle other Umlauts too, or is that just plain wrong
> > assumption?
> >
> > if someone could hit me with a cluestick please ;)
> >
> > cheers,
> > Sebastian
> >
> > the patch to not omit Umlauts.
> > $OpenBSD$
> > --- Source/GSString.m.orig Tue Jul 31 18:31:36 2012
> > +++ Source/GSString.m Tue Jul 31 18:32:24 2012
> > @@ -3699,6 +3700,8 @@ agree, create a new GSCInlineString otherwise.
> > while (i-- > 0)
> > {
> > o->_contents.c[i] = tolower(_contents.c[i]);
> > + if (o->_contents.c[i] == 0)
> > + o->_contents.c[i] = _contents.c[i];
> > }
> > o->_flags.wide = 0;
> > o->_flags.owned = 1; // Ignored on dealloc, but means we own buffer
> >
> > _______________________________________________
> > Discuss-gnustep mailing list
> > Discuss-gnustep@gnu.org
> > https://lists.gnu.org/mailman/listinfo/discuss-gnustep
>
> --
> This email complies with ISO 3103
>