[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Does sort handle -t / correctly
From: |
Eric Blake |
Subject: |
Re: Does sort handle -t / correctly |
Date: |
Fri, 17 Apr 2015 10:26:44 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 04/17/2015 10:10 AM, Peng Yu wrote:
> Hi, I got the following results when I call sort with -t /. It seems
> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
> using sort correctly?
Your assumption is correct - you are using sort incorrectly, by failing
to take locales into account, and by failing to limit the amount of data
being compared to single field widths.
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
> a
> a!
> a/1.txt
> aB
> ab
sort --debug is your friend:
$ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4
sort: using ‘en_US.UTF-8’ sorting rules
a
_
^ no match for key
^ no match for key
^ no match for key
_
a!
__
^ no match for key
^ no match for key
^ no match for key
__
a/1.txt
_______
_____
^ no match for key
^ no match for key
_______
ab
__
^ no match for key
^ no match for key
^ no match for key
__
aB
__
^ no match for key
^ no match for key
^ no match for key
__
As shown in the debug trace, the line 'a!' sorts prior to the line
'a!1.txt' because your first sort key is the entire line, and in the
locale you are using (where both '!' and '/', and also '.', are ignored
in collation orders), the collation string "a" really does come before
"a1txt".
What you REALLY want is to limit your sorting to a single field at a
time (-k1,1 rather than -k), as in:
$ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2
sort: using ‘en_US.UTF-8’ sorting rules
a
_
^ no match for key
_
a/1.txt
_
_____
_______
a!
__
^ no match for key
__
ab
__
^ no match for key
__
aB
__
^ no match for key
__
Or additionally, to limit your sorting to a locale that does not discard
punctuation as unimportant, as in:
$ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1
-k 2
sort: using simple byte comparison
a
_
^ no match for key
_
a/1.txt
_
_____
_______
a!
__
^ no match for key
__
aB
__
^ no match for key
__
ab
__
^ no match for key
__
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature