[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]UCS4 support in Xml

From: Michal Moskal
Subject: Re: [DotGNU]UCS4 support in Xml
Date: Mon, 10 Mar 2003 09:32:07 +0100
User-agent: Mutt/1.4i

On Sun, Mar 09, 2003 at 05:19:29PM -0700, minddog wrote:
> Hey,
>       I just added the internal XmlStreamReader class that will complement 
> the 
> normal IO StreamReader for UCS4 support.  Heres my question though, should we 
> make all handling of the encoding portations of XmlStreamReader, UCS4Encoding 
> instead of Encoding?  I'm not very educated on this subject, but UCS4 is 
> basically a larger set of characters opposed to UCS2.  Is it 16bit for UCS2 
> and 32bit for UCS4 ?  Some help here might answer my own questions =) 

One can encode any unicode character in utf-8, ucs-2 and ucs-4. You
simply just need 2 or more bytes/words/whatever to encode one
characters. In UTF-8 uses bytes, UCS-2 uses 16-bit words and UCS-4 uses
32-bit words. 

UCS-4 is good as internal representation since it gives constant time
indexing (each and every character takes 4 bytes, period), but in case
of mostly ascii text it causes 4x space blowup.

XML files mostly use utf-8 since it's most compact.

Visit for more info.

: Michal Moskal ::::: malekith/at/ :  GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::: Wroclaw University, CS Dept :  {E-,w}-- {b++,e}>+++ h

reply via email to

[Prev in Thread] Current Thread [Next in Thread]