[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?

From: Marcus Sundman
Subject: Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
Date: Sun, 29 Aug 2004 00:41:34 +0300
User-agent: KMail/1.7

On Sunday 29 August 2004 00:09, Tom Lord wrote:
> > From: Marcus Sundman <address@hidden>
> >
> > > > That said, I agree that UTF-8 is much better than UTF-16 in
> > > > most cases,
> > >
> > > Doesn't that depend on what (human) languages you are using most?
> >
> > Indeed it does. And also what tools you use. On the one hand most
> > Chinese characters are 2 bytes in UTF-16 while 3 in UTF-8, but on
> > the other hand UTF-16 contains null bytes. And then it also depends
> > on the access patterns. If you always want sequential access then
> > UTF-8 might be OK, but if you often want to jump to specific
> > locations then UTF-8 sucks big time.
> In what way, if any, is UTF-16 better?

I don't think there is a "way" in which UTF-16 is better, but there are a 
few such *cases*. One example is in what I quoted above, it's more compact 
for Chinese.

> (Are you familiar with "surrogate character" codepoints?

Yes, and for strings with those UTF-16 is as bad as UTF-8 when it comes to 
random access. It's the same with UTF-8 encoded strings containing only 
single byte chars, versus ones with multi-byte chars. It's just that 
multi-byte UTF-8 chars are much more common than UTF-16 surrogates.

> or with the distinction between codepoints and glyphs?)

Um.. yes, but I don't see what that has to do with this. Please enlighten me 
if it I'm missing something. (I probably am.)

- Marcus Sundman

reply via email to

[Prev in Thread] Current Thread [Next in Thread]