[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [DotGNU]UCS4 support in Xml
From: |
Rhys Weatherley |
Subject: |
Re: [DotGNU]UCS4 support in Xml |
Date: |
Mon, 10 Mar 2003 19:01:33 +1000 |
User-agent: |
KMail/1.4.3 |
On Monday 10 March 2003 10:19 am, minddog wrote:
> Hey,
> I just added the internal XmlStreamReader class that will complement the
> normal IO StreamReader for UCS4 support. Heres my question though, should
> we make all handling of the encoding portations of XmlStreamReader,
> UCS4Encoding instead of Encoding? I'm not very educated on this subject,
> but UCS4 is basically a larger set of characters opposed to UCS2. Is it
> 16bit for UCS2 and 32bit for UCS4 ? Some help here might answer my own
> questions =) Thanks.
UCS-2 is the character format that is used by most of C#, and that should be
the standard way to process characters internally within the XML code.
All of the important UCS-4 characters can be represented in UCS-2, either
directly as 16-bit values, or as pairs of 16-bit values (called surrogates).
This gives an effective character set size of about 20 bits, which is pretty
huge (over 1 million characters).
The UCS4Encoding class already takes care of converting 32-bit sequences into
16-bit UCS-2 on the fly, inserting surrogates where necessary. You should
stick to UCS-2 everywhere else in the XML code, including in XmlStreamReader.
It isn't worth using UCS-4 as the standard character set elsewhere.
Cheers,
Rhys.