bug-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: patch to gnustep-base (Unicode and others)


From: Richard Frith-Macdonald
Subject: Re: patch to gnustep-base (Unicode and others)
Date: Mon, 8 Apr 2002 07:09:12 +0100

On Sunday, April 7, 2002, at 11:15 PM, Serg Stoyan wrote:

Hello, Richard Frith-Macdonald.

 RFM> > Here is a patch to the gnustep-base, whith additions such as:
RFM> > - fixes NSString's initWithCString* methods behaviour by commenting out RFM> > GSString's. Without it initWithCString* methods doesn't convert C
 RFM> >   string into Unicode and this is not OpenStep compliant;
 RFM>
RFM> Perhaps you can explain more ... as far as I cn see the above is simply RFM> wrong. Certainly initWithCString* methods are not supposed to convert to RFM> unicode (as a general rule), and OpenStep doesn't say they should - so
 RFM> I'm guessing you have some meaning in mind that is not immediately
 RFM> obvious to me.

Here is the citation from "OpenStep Specification" (c) 1994 NeXT Computer
  Inc. Class NSString, page 2-127:
  "- (id)initWithCString:(const char *)byteString

Initializes the receiver, a newly allocated NSString, by converting the one-byte characters in byteString into Unicode characters. byteString must
  be a null-terminated C string in the default C string encoding."

OK ... guess I was wrong about that ... it *does* seem to say strings should be
converted to unicode ... but that's incorrect/misleading documentation.

If you look in the class description documentation, it tells you that -

'While the actual representation of character strings stored in NSString and NSMutableString is independant of any particular implementation, you can in general think of the contents of NSString and NSMNutableString object as being, canonically,
Unicode characters (defined by the unichar data type)'

Really, this means that you should not take the method descriptions too literally, they are describing an API, not particular internal implementation details.

RFM> > - adds 2 languages into Resources/Languages: Russian and Ukrainian;
 RFM>
RFM> Thanks, but I can't use them ... as I don't know what encoding you have RFM> created them in. I have added a README file to the Resources/Languages
 RFM> subdirectory to say what format language files *should* be in (and
 RFM> corrected some errors in the existing files).

It's ok. I've just updated from CVS and created this files by cvtenc'ing
  them, just like README says. But... When i start any app i get this
  message:

File NSDictionary.m: 458. In [GSDictionary -initWithContentsOfFile:] Contents of file '/home/stoyan/GNUstep/System/Libraries/Resources/Languages/Russian' does not contain a dictionary

All I can suggest here is making sure you have the latest code installed.
I fixed a bug in loading 16-bit unicode property lists a day or two ago.

  Here is my some environment vars:

  [stoyan@localhost]$ echo $GNUSTEP_STRING_ENCODING; echo $LANG
  NSKOI8RStringEncoding
  ru_RU.KOI8-R

  I've attached Russian and UkraineRussian(conforming to Locale.aliases)
  files as well.

Thanks, I've added them (I converted to ascii with \u escapes for consistency
with the other files, but that should make no difference).

I guess we can use 2 types of language files -- plain text property list, with encoding in its file name and non-printable unicode file. For example,
  in case of russian:

  Languages/Russian.KOI8-R         <-- plain proplist in KOI8-R encoding
Languages/Russian.WindowsCP1251 <-- plain proplist in Windows 1251 encoding Languages/Russian <-- Unicode file, created with 'cvtenc'

Property lists should be ascii ... so I prefer to keep an ascii property list containing \u escape sequences for non-ascii character, and create the other
files temporarily (for editing) using cvtenc

In this case we use Unicode file, and proplist files remains for editors.

But keeping multiple copies in different formats could let them get out of
sync with each other if you are not careful.

Or we can use proplist files with appropriate encoding scheme, if we have
  to use it(no unicode file for some reason).

Property list files are ascii.
Strictly speaking, anything non-ascii is not a legal property-list file, so while unicode files are also portable, I'd still prefer to stick to ascii files with \u escape sequences. That is, if we are sticking to one portable format
for consistency, I'd prefer it to be the ascii.


PS: Another thing i've mentioned (and i guess should be somwhere in
Documentation) is about using non-ascii characters when initializing NSString
variable. I mean using such definition:

NSString  *some_string = @"some non-ascii characters";

is deprecated. In this case string doesn't not converted into Unicode and
results is unpredictable, or something.

Well, OpenStep spec simply tells you not to do it (I'd say that's closer to
'illegal' than 'deprecated') in the NSString class description.

Where do you think this should be documented in GNUstep ?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]