[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #4658], and also [bug #4624]

From: Pete French
Subject: Re: [bug #4658], and also [bug #4624]
Date: Fri, 05 Sep 2003 13:32:32 +0100

> To change the name for the iconv conversion you just need to change the 
> function internal_unicode_enc(), which is also in the Unicode.m file. 
> And of course make sure that you use names that can be found on all 
> machines out there on the net. This was the hard part the last time, 
> which explains why we don't have a hardcoded name there.

O.K., taking a look at this we attyempt UNICODE_INT followed by
UCS-2-INTERNAL and then UCS-2. Looking at the manual pages for iconv
it appears to be tthat this needs changing to a test for UTF-16
followed by either UTF-16BE or UTF-16LE if that fails.

Whats puzzling me is the existing UNICODE_INT test - I cant find any
documentation for this in iconv.  Googling for it only leads to one
match - which is for a discussion on itsuse in GNUstep.

I would suggest that the UTF-16 teests be put in at the start, wil the
code falling back to using the existing tests if that fails. This should
world under the majority of circumstances after all. What do you think ?

(and is falling back to the BE/LE variants iif the generic fails necessary.
I note that we dont do this with the UCS-2 encoding)

> For the other conversions we need at least to make sure that they ignore 
> the complex characters and not just ignore the start of one. I don't 
> know what is needed for this.

A complex character is encoded as two 16 bit words - and both of these words
are not part of the BMP. i.e. none of the conversion routines should have
a mapping for anything in the range of the surrogates.

...or in others words, if it ignores the first it will ignore the second
as well.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]