Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin

bug-libunistring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistrin

From:	Ben Pfaff
Subject:	Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Date:	Sat, 13 Nov 2010 06:29:18 -0800
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Paolo Bonzini <address@hidden> writes:

> On 11/13/2010 06:30 AM, Ben Pfaff wrote:
>>    (c) For an invalid (but complete) sequence, it reports each
>>        byte as a separate code point U+FFFD.
>
> Maybe this is what you want to change... It seems to me that:
>
>>          c0 (U+FFFD) 41 (U+0041)
>>
>>            (c0 never appears in UTF-8)
>
> This is ok.
>
>>          e1 (U+FFFD) e1 (U+FFFD) 80 (U+FFFD)
>>
>>            (This would be a UTF-16 surrogate if it was allowed.)
>
> This should be a single U+FFFD.
>
>>          e0 (U+FFFD) a0 (U+FFFD) 00 (U+0000)
>>
>>            (e1 starts a 3-byte sequence but 00 is invalid as the
>>            third byte.)
>
> This should be U+FFFD U+0000.
>
> Thoughts?  This should make it possible to implement backwards
> iteration satisfying these properties.

I'd be happy to do that way, too (or show why it cannot work, if
there is some reason that it cannot).  My goal is robust backward
iteration, not particular semantics for ill-formed sequences.

Thanks,

Ben.
-- 
Ben Pfaff 
http://benpfaff.org

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-libunistring] UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13
- Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Paolo Bonzini, 2010/11/13
  - Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Ben Pfaff <=
    - Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
- [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
  - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13
    - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Bruno Haible, 2010/11/13
    - [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring, Ben Pfaff, 2010/11/13

Prev by Date: Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Next by Date: [bug-libunistring] Re: UTF-8 backward iteration proposal for libunistring
Previous by thread: Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Next by thread: Re: [bug-libunistring] UTF-8 backward iteration proposal for libunistring
Index(es):
- Date
- Thread