Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin

bug-libunistring

From:	Paolo Bonzini
Subject:	Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Date:	Sat, 13 Nov 2010 13:18:25 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.6

On 11/13/2010 06:30 AM, Ben Pfaff wrote:

   (c) For an invalid (but complete) sequence, it reports each
       byte as a separate code point U+FFFD.


Maybe this is what you want to change... It seems to me that:

         c0 (U+FFFD) 41 (U+0041)

           (c0 never appears in UTF-8)


This is ok.

         e1 (U+FFFD) e1 (U+FFFD) 80 (U+FFFD)

           (This would be a UTF-16 surrogate if it was allowed.)


This should be a single U+FFFD.

         e0 (U+FFFD) a0 (U+FFFD) 00 (U+0000)

           (e1 starts a 3-byte sequence but 00 is invalid as the
           third byte.)


This should be U+FFFD U+0000.

Thoughts? This should make it possible to implement backwards iterationsatisfying these properties.


Paolo

Current Thread

[Next in Thread]

[bug-libunistring] UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13
- Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Paolo Bonzini <=
  - Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13
    - Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
- [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
  - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13
    - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
    - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13