guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %default-port-conversion-strategy and string ports


From: David Kastrup
Subject: Re: %default-port-conversion-strategy and string ports
Date: Thu, 31 May 2012 16:48:30 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> Hi,
>
> David Kastrup <address@hidden> skribis:
>
>> Shouldn't strings be in "internal encoding" anyway?  The whole point of
>> a string is to be an array of characters.  Not an array of arbitrarily
>> encoded bytes.
>
> Yes, but I was referring to “string ports”, which may actually be fed
> arbitrary binary data, not just characters.

How so?  A string is an array of characters.  Arbitrary binary data is
an array of bytes.  Merrily mixing the two is not going to lead to
consistent results: you are going to have things accidentally decoded
more than once or not at all, and accidentally encoded more than once or
not at all.

Emacs _does_ have unibyte-string as a data structure of raw bytes for
efficiency reasons, but it is not clear that the hassle is worth it.
You _can_ read binary data into multibyte strings: non-utf-8 sequences
are then put into special code places so that they can be recovered
unchanged when encoding again.  So for strings, Emacs has the two kinds:
unibyte (raw data) and multibyte (conceptually an array of Unicode
characters in some hidden multibyte encoding incidentally quite close to
utf-8).

And that is all.  The rest is decoded (typically from unibyte) into
multibyte, and encoded back when writing it somewhere.  How are you even
supposed to deal with combining strings when they can be encoded
differently?

-- 
David Kastrup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]