[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] roundtrippable encoding support
From: |
Ben Pfaff |
Subject: |
Re: [bug-libunistring] roundtrippable encoding support |
Date: |
Fri, 10 Oct 2014 08:47:03 -0700 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Thu, Oct 09, 2014 at 06:04:02PM +0200, David Kastrup wrote:
> What I am actually more interested in is in having libunistring offer
> "roundtrippable" encodings as a fallback for decoding errors.
> Basically, I want an option for decoding where libunistring announces
> "what you have here is not valid utf-8 but I know how to deal with it".
> Including reencoding. And delivering unique "character codes" and
> string length calculations. The application would either keep track of
> having received "dirty utf-8" and would reencode when putting out utf-8
> (where reencoding "internal utf-8" to "external utf-8" means replacing
> the 2-byte sequences representing a wild byte by their original byte),
> or it would reencode into "external" utf-8 when writing anyway which
> would not change anything for originally valid utf-8.
It sounds like a reasonable philosophy to me. I don't think I'd want
this to become the only option for libunistring, but if there's a
practical way to add alternate interfaces, etc., then I think that would
be valuable.
(I am not the libunistring maintainer and don't intend to speak for
him.)