bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] roundtrippable encoding support


From: Daiki Ueno
Subject: Re: [bug-libunistring] roundtrippable encoding support
Date: Wed, 15 Oct 2014 15:53:43 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

Ben Pfaff <address@hidden> writes:

> On Thu, Oct 09, 2014 at 06:04:02PM +0200, David Kastrup wrote:
>> What I am actually more interested in is in having libunistring offer
>> "roundtrippable" encodings as a fallback for decoding errors.
>> Basically, I want an option for decoding where libunistring announces
>> "what you have here is not valid utf-8 but I know how to deal with it".
>> Including reencoding.  And delivering unique "character codes" and
>> string length calculations.  The application would either keep track of
>> having received "dirty utf-8" and would reencode when putting out utf-8
>> (where reencoding "internal utf-8" to "external utf-8" means replacing
>> the 2-byte sequences representing a wild byte by their original byte),
>> or it would reencode into "external" utf-8 when writing anyway which
>> would not change anything for originally valid utf-8.
>
> It sounds like a reasonable philosophy to me.  I don't think I'd want
> this to become the only option for libunistring, but if there's a
> practical way to add alternate interfaces, etc., then I think that would
> be valuable.

I don't have anything to add.  I think it would be nice if Guile had a
transparent support for "raw-bytes" and UTF-8 sequences[1], but I don't
think it is a good idea to expose internal "character codes" or
"internal utf-8" representation from the library interface.

[1] for example, the results of decoding external byte sequences
    "\xC2\xA0" and "\xA0" should report the same character code in REPL,
    but they are internally distinguished and converted to the original
    bytes when writing, like Emacs does.

Regards,
--
Daiki Ueno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]