[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin
From: |
Paolo Bonzini |
Subject: |
Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring |
Date: |
Sat, 13 Nov 2010 13:18:25 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.6 |
On 11/13/2010 06:30 AM, Ben Pfaff wrote:
(c) For an invalid (but complete) sequence, it reports each
byte as a separate code point U+FFFD.
Maybe this is what you want to change... It seems to me that:
c0 (U+FFFD) 41 (U+0041)
(c0 never appears in UTF-8)
This is ok.
e1 (U+FFFD) e1 (U+FFFD) 80 (U+FFFD)
(This would be a UTF-16 surrogate if it was allowed.)
This should be a single U+FFFD.
e0 (U+FFFD) a0 (U+FFFD) 00 (U+0000)
(e1 starts a 3-byte sequence but 00 is invalid as the
third byte.)
This should be U+FFFD U+0000.
Thoughts? This should make it possible to implement backwards iteration
satisfying these properties.
Paolo