[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and Guile

From: Stephen Compall
Subject: Re: Unicode and Guile
Date: 25 Oct 2003 18:08:45 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Andy Wingo <address@hidden> writes:

> If there is no plan, may I suggest that we move our internal
> representation of strings to UTF-8. There's an interesting
> introductory article written on, although I
> don't have the link ATM. This has the advantage that ASCII
> characters up to 127 are represented the same.

I think this may be a disadvantage.  As you say, UTF-8 strings are
still not ASCII-compatible, but that casting their data blocks to
char* still works for ASCII strings, people might be tempted to simply
do that, because other languages "don't matter enough to bother with

> Of course, above that characters might take up to eight bytes, which
> means that all code that processes user-input strings has to be
> changed. Painful, eh?  But if we hope to write apps that deal with
> all languages of the world, that's the only way.
> So, reactions on that would be appreciated.

As a result, UCS-4 strings have the advantage of breaking code that
tries to merely interpret the data block as char*.  UCS-4 is what
wchar_t is in glibc.  I'd debate the virtues of treating all code
points equally, versus their status in UTF-8, but I'm sure that's
better done (and has been done) in another forum.  UCS-2 shouldn't
even be considered an option, and UTF-16 seems to offer the worst of
both worlds.

As for the semantics, I submit the way Emacs does it: node (elisp)Text
Representations, or

Stephen Compall or s11 or sirian

I think your opinions are reasonable, except for the one about my mental
                -- Psychology Professor, Farifield University

Etacs Becker quarter Albright csim Delta Force defense information
warfare Perl-RSA CDC condor undercover SAFE analyzer ASPIC USCODE

reply via email to

[Prev in Thread] Current Thread [Next in Thread]