[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Which Encoding? (was Re: Unicode and Guile)

From: Stephen Compall
Subject: Which Encoding? (was Re: Unicode and Guile)
Date: 26 Oct 2003 12:34:47 +0000
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Tom Lord <address@hidden> writes:

> It's culturually discriminatory to regard utf-16 as worse than utf-8
> in those regards.
> Or, put differently, for many potential users, utf-16 is the best of
> both worlds: it optimizes the size of the most common characters
> (for some users), and it can also handle any Unicode character.

That's the thing -- it can't, at least not thinking in fixed-width
terms, which was my goal in suggesting UCS-4.  It may be able to
handle all *current* Unicode characters, but what about those in the
future?  Unicode supports code points higher than 16-bit.

I say it's the worst of both worlds (from the C API user's point of
view), because you have to deal with breaking ASCII compatibility for
7-bit code points, *and* still need surrogate characters
(i.e. variable width), for code points above 65535 (the difference
between UTF-16 and UCS-2).

UTF-16 suffers the same problem as UTF-8: programmers may be tempted
to simply treat the data block as fixed-width 16-bit strings (8-bit
for UTF-8, of course), which of course will break on the surrogate

If you want to assume that Unicode will never grow out of the 16-bit
set, then UCS-2 would be a much better choice than UTF-16, IMHO.  That
way, it is clear that C programs only need deal with fixed-width,
16-bit characters.

Stephen Compall or s11 or sirian

Since a politician never believes what he says, he is surprised
when others believe him.
                -- Charles DeGaulle

Ft. Meade Lexis-Nexis smuggle virus BROMURE JSOFC3IP emc plutonium
electronic surveillance quarter number key offensive information
warfare fraud Albania Khaddafi

reply via email to

[Prev in Thread] Current Thread [Next in Thread]