[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LC_COLLATE in the C locale

From: Paul Eggert
Subject: Re: LC_COLLATE in the C locale
Date: Wed, 18 Dec 2019 08:27:02 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2

On 12/18/19 2:29 AM, Bruno Haible wrote:
> Hi Paul,
>> I do have a qualm in that coreutils (and I assume others) interpret 
>> !hard_locale
>> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte
>> comparison.
> Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html> ?

I don't see where that section requires unibyte.

>> As I recall on some platforms (macOS maybe?), the C locale uses
>> UTF-8 so this interpretation isn't correct.
> UTF-8 has the nice property that byte-per-byte comparison and codepoint-per-
> codepoint comparison are equivalent.

True, so the code that assumes strcmp == strcoll should work. But I think some
code specifically assumes unibyte. Presumably that code should also check
MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice
in theory).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]