bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23665: spaces in keys: doc, --debug in LC_ALL=C


From: Assaf Gordon
Subject: bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Date: Tue, 31 May 2016 15:11:10 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0

Hello Karl!

On 05/31/2016 02:32 PM, Karl Berry wrote:
I run
   LC_ALL=en_US.UTF-8 sort --debug -k 2 /tmp/foo  # or -k 2,2 et al.
And get the nicely explanatory output for the "surprising" result:
[...]

Just to verify, the surprising result is in C locale?

I'm seeing the following, for "en_US.UTF-8" it's the order I'd expect, but the 
"C" is surprising:

    $ cat -A k.txt
    M  Build/zfile$
    M  Master/mfile$
    MM Build/afile$

    $ LC_ALL=en_US.UTF-8 sort -k2 k.txt
    MM Build/afile
    M  Build/zfile
    M  Master/mfile

    $ LC_ALL=C sort -k2 k.txt
    M  Build/zfile
    M  Master/mfile
    MM Build/afile

But the information is just as valid in C as in UTF-8, so far as I can
see.  Thus it would be nice for it to be present.

If I understand correctly, one could argue the warning is even more important 
in C locale than in UTF-8 locales,
as collating rules for UTF-8 make leading spaces less significant.

As in:

    $ cat -A s.txt
    M A$
    M  B$
    M   D$
    M  C$

UTF-8 makes leading spaces less important:

    $ LC_ALL=en_US.UTF-8 sort -k2 s.txt
    M A
    M  B
    M  C
    M   D

in C locale, spaces (as simple bytes) do matter:

    $ LC_ALL=C sort -k2 s.txt
    M   D
    M  B
    M  C
    M A

-b skips leading spaces:

    $ LC_ALL=C sort -k2b s.txt
    M A
    M  B
    M  C
    M   D


More importantly, I urge that the documentation for sort give an example
of this.  The idea that following blanks after the first become part of
the next field is highly counter-intuitive.

I agree,
I can add the above example to the documentation (also possibly to the FAQ or 
Gotcha pages?).
What do you think?

The condition to print this message is here:
 http://lingrok.org/xref/coreutils/src/sort.c#2435
I can try to suggest a patch to print it in C locale as well (hopefully 
tonight).


It would also be nice if the definition of "key 1" was stated.
Awfully easy to misread that as "field 1".

How about "leading blanks are significant in sort key [...]" ?
(in http://lingrok.org/xref/coreutils/src/sort.c#2439 )


regards,
 - assaf










reply via email to

[Prev in Thread] Current Thread [Next in Thread]