[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24425: [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strin
bug#24425: [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Thu, 15 Sep 2016 21:55:20 +0300
> From: Michal Nazarewicz <address@hidden>
> Cc: address@hidden
> Date: Thu, 15 Sep 2016 16:23:54 +0200
> On Tue, Sep 13 2016, Eli Zaretskii wrote:
> > Currently, case changes in unibyte characters and strings are only
> > well defined for pure ASCII text; if the input or the result is not
> > pure ASCII, we produce "undefined behavior".
> Would the following (not tested) make sense then:
AFAIU, it would disallow handling unibyte text by setting up case
tables for 8-bit characters in their multibyte representation,
i.e. above #x3FFF00. I'd rather not lose that, although I don't think
I've ever seen that used.
> > Properly means that upcasing "istanbul" in the above example will
> > produce "İSTANBUL", not "iSTANBUL", and downcasing "IRMA" will produce
> > "ırma".
> I thought about that but then another corner case is "istanbul\xff"
> which is a unibyte string with 8-bit bytes.
And what is the problem in that case?
> I have no strong feelings either way so I’m happy just leaving it as is
> as well.
That is fine with me.
Was there some real-life use case where you bumped into this? If so,
maybe we should discuss that use case, perhaps the solution, if we
need one, is something other than what we talked about until now.