[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: May strcoll return 0 if strcmp returns non-0
From: |
Eric Blake |
Subject: |
Re: May strcoll return 0 if strcmp returns non-0 |
Date: |
Tue, 31 Mar 2015 06:36:54 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 |
FYI: This thread on the Austin Group mailing list claims that coreutils
has a bug in at least uniq (although Stephane has not yet filed formal
bug reports against the standard, so we may instead be able to get the
standard relaxed to allow our behavior of collating rather than
comparing strings).
On 03/30/2015 04:07 PM, Stephane Chazelas wrote:
> Thanks guys, so if I sum up.
>
> Yes, POSIX explicitely allows different collating sequences to
> sort the same and strcoll(a, b) to return 0 when a != b.
>
> sort -u doesn't report unique lines, but the first of sequences
> of lines that sort the same. So GNU sort is conformant in that
> regard.
>
> sort|uniq reports unique lines (provided the input is valid
> text). GNU uniq is not conformant in that sort|uniq behaves like
> sort -u.
>
> There's a mention of LC_COLLATE in the uniq spec that is
> irrelevant and should be removed as uniq doesn't make use of it.
>
> comm and join will match lines/keys that sort the same as
> opposed to keys that match from a strict byte-to-byte
> comparison.
>
> expr "x$a" = "x$b"
> and
> awk 'ENVIRON["a"] == ENVIRON["b"]'
>
> are required to return true for values of $a and $b which are
> different but sort the same. So GNU expr is conformant, but GNU
> awk (or mawk) is not.
>
> The order of the lines in the output of "ls" or "printf '%s\n'
> *" is non-deterministic.
>
> It would be useful to add notes (in "APPLICATION USAGE"
> sections) about the implications of all that in the specs of
> utilities like sort, uniq, comm, join, awk, expr, test, ls at
> least.
>
> POSIX doesn't require nor forbid locales to have such collation
> sequences that sort the same, so I'm entitled to point out what
> I consider an issue in GNU libc's *.UTF-8 locales as a "poor
> design choice IMO", but not as a "POSIX conformance bug" as
> POSIX clearly allows that behaviour.
>
> If locales were changed to provide with a "total order" (like
> what appears to be the case in the Solaris and FreeBSD locales
> I've tested), then all those problems (causing bugs and
> potential security vulnerabilities) would go away.
>
> Thanks,
> Stephane
>
>
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: May strcoll return 0 if strcmp returns non-0,
Eric Blake <=