bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin


From: Ben Pfaff
Subject: Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Date: Sat, 13 Nov 2010 06:29:18 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Paolo Bonzini <address@hidden> writes:

> On 11/13/2010 06:30 AM, Ben Pfaff wrote:
>>    (c) For an invalid (but complete) sequence, it reports each
>>        byte as a separate code point U+FFFD.
>
> Maybe this is what you want to change... It seems to me that:
>
>>          c0 (U+FFFD) 41 (U+0041)
>>
>>            (c0 never appears in UTF-8)
>
> This is ok.
>
>>          e1 (U+FFFD) e1 (U+FFFD) 80 (U+FFFD)
>>
>>            (This would be a UTF-16 surrogate if it was allowed.)
>
> This should be a single U+FFFD.
>
>>          e0 (U+FFFD) a0 (U+FFFD) 00 (U+0000)
>>
>>            (e1 starts a 3-byte sequence but 00 is invalid as the
>>            third byte.)
>
> This should be U+FFFD U+0000.
>
> Thoughts?  This should make it possible to implement backwards
> iteration satisfying these properties.

I'd be happy to do that way, too (or show why it cannot work, if
there is some reason that it cannot).  My goal is robust backward
iteration, not particular semantics for ill-formed sequences.

Thanks,

Ben.
-- 
Ben Pfaff 
http://benpfaff.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]