Re: guarantees of u8_mbtouc/u8_strmbtouc

From: Bruno Haible
Subject: Re: guarantees of u8_mbtouc/u8_strmbtouc
Date: Sat, 31 Jul 2010 23:01:56 +0200
Paolo Bonzini wrote:
> "u8_mbtouc will never access more than N bytes.  However, as an 
> additional guarantee, u8_mbtouc only accesses as many bytes as necessary 
> to decode the first Unicode character, or to ascertain that S does not 
> begin with a valid UTF-8 sequence."

This is complicated to understand, because it requires the programmer to
understand how a Unicode character is parsed.

> > The code may be changed in the future. If a guarantee is not documented AND
> > checked by the test suite, you cannot rely on it.
> Of course, that's why I'm suggesting a modification to the specification.

What's the use case which would profit from such a guarantee?
libunistring supports two string data types: one where the length of the
string (number of units) is known, and one which is U+0000 terminated.
Are you suggesting that these two data types are not sufficient to cover
the users' needs?

If your only point is to save a couple of instructions, then's it's a too
small benefit, in my opinion.


