[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: i18n? unicode?
Re: i18n? unicode?
14 Feb 2002 00:46:38 +0900
Gnus/5.0808 (Gnus v5.8.8) Emacs/21.1
>>>>> "Simon" == Simon Josefsson <address@hidden> writes:
Simon> On 13 Feb 2002, Alex Shinn wrote:
>> One of the big catches is that Guile wants to both replace
>> Emacs-Lisp and extend well with C. For efficient multi-byte
>> strings, Emacs-Lisp has its own string-representation, and the
>> obvious idea would be to do likewise (probably using unicode
>> instead of mule), but then you don't play well with C libraries
>> and have to do conversions everywhere.
Simon> Is "automatic" conversions really necessary? The "automatic"
Simon> (guessing) logic of Emacs MULE seems to cause unexpected
Simon> behaviour at some times.
Automatic in what sense? We have to do type-checking no matter what,
why not dispatch at the same time? And unless you mean guessing of file
coding system (which, granted, is handled very poorly), I don't see
where there's any guessing involved - an existing string has a fixed
>> The only Scheme I know of that has decent multibyte support is
>> Gauche, and that is at the expense of performance on string-ref
>> and the like. To make up for this it provides string pointers to
>> loop through strings. A C API for extensions would presumably
>> need to do explicit conversions.
Simon> Seems like a hack...
Depends how you look at it. Gauche is a native Japanese Scheme. In a
Japanese environment, you really want to work with euc-jp and/or sjis.
In those (variable byte-width) encodings, string-ref is not constant
time without some notion of a string pointer. Mutibyte strings are just
something not taken into account in traditional western string
representations. From the Asian perspective, treating strings as
character arrays could be perceived as a premature optimization.
>> Bigloo has limited ucs2 support, but not really unified - you
>> have to know what strings you're working with.
Simon> Internally I think this approach seems best -- if you don't
Simon> know what strings you're working with, you can't expect
Simon> things to work. Of course, users can't be expected to know
Simon> these things, but I don't see why users need to concern
Simon> themselves with the low-level interface..
But bigloo requires an entire separate collection of procedures for ucs2
strings, so you have to call either string-length or ucs2-string-length
depending on the type of the string. That's like having to check the
result of exact? on every number before dispatching on the appropriate
numeric procedure, which to me seems pretty much worthless.
Simon> I think basing this on the character set stuff available in
Simon> GNU libc and iconv would make it behave like "other"
Simon> applications, which is a good thing:
Alas, other applications behave in different ways, and we also want
clean multi-byte strings within Guile itself without having to do
explicit conversions. Iconv may be the right way to go, but I really
think we want Guile to keep track of the encodings for us.