Re: LC_COLLATE in the C locale

From: Paul Eggert
Subject: Re: LC_COLLATE in the C locale
Date: Wed, 18 Dec 2019 08:27:02 -0800
On 12/18/19 2:29 AM, Bruno Haible wrote:
> Hi Paul,
>> I do have a qualm in that coreutils (and I assume others) interpret 
>> !hard_locale
>> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte
>> comparison.
> Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html> ?

I don't see where that section requires unibyte.

>> As I recall on some platforms (macOS maybe?), the C locale uses
>> UTF-8 so this interpretation isn't correct.
> UTF-8 has the nice property that byte-per-byte comparison and codepoint-per-
> codepoint comparison are equivalent.

True, so the code that assumes strcmp == strcoll should work. But I think some
code specifically assumes unibyte. Presumably that code should also check
MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice
in theory).

