bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin


From: Paolo Bonzini
Subject: Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Date: Sat, 13 Nov 2010 13:18:25 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.6

On 11/13/2010 06:30 AM, Ben Pfaff wrote:
   (c) For an invalid (but complete) sequence, it reports each
       byte as a separate code point U+FFFD.

Maybe this is what you want to change... It seems to me that:

         c0 (U+FFFD) 41 (U+0041)

           (c0 never appears in UTF-8)

This is ok.

         e1 (U+FFFD) e1 (U+FFFD) 80 (U+FFFD)

           (This would be a UTF-16 surrogate if it was allowed.)

This should be a single U+FFFD.

         e0 (U+FFFD) a0 (U+FFFD) 00 (U+0000)

           (e1 starts a 3-byte sequence but 00 is invalid as the
           third byte.)

This should be U+FFFD U+0000.

Thoughts? This should make it possible to implement backwards iteration satisfying these properties.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]