[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Does sort handle -t / correctly
From: |
Peng Yu |
Subject: |
Re: Does sort handle -t / correctly |
Date: |
Fri, 17 Apr 2015 12:03:25 -0500 |
On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake <address@hidden> wrote:
> On 04/17/2015 10:10 AM, Peng Yu wrote:
>> Hi, I got the following results when I call sort with -t /. It seems
>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
>> using sort correctly?
>
> Your assumption is correct - you are using sort incorrectly, by failing
> to take locales into account, and by failing to limit the amount of data
> being compared to single field widths.
Thanks for the explanation.
If I don't know the number of fields, but I want to sort according to
all fields (from 1 to whatever the max number of fields), is there a
way to do it?
>> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
>> a
>> a!
>> a/1.txt
>> aB
>> ab
>
> sort --debug is your friend:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
> ^ no match for key
> ^ no match for key
> ^ no match for key
> _
> a!
> __
> ^ no match for key
> ^ no match for key
> ^ no match for key
> __
> a/1.txt
> _______
> _____
> ^ no match for key
> ^ no match for key
> _______
> ab
> __
> ^ no match for key
> ^ no match for key
> ^ no match for key
> __
> aB
> __
> ^ no match for key
> ^ no match for key
> ^ no match for key
> __
>
>
> As shown in the debug trace, the line 'a!' sorts prior to the line
> 'a!1.txt' because your first sort key is the entire line, and in the
> locale you are using (where both '!' and '/', and also '.', are ignored
> in collation orders), the collation string "a" really does come before
> "a1txt".
>
> What you REALLY want is to limit your sorting to a single field at a
> time (-k1,1 rather than -k), as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
> ^ no match for key
> _
> a/1.txt
> _
> _____
> _______
> a!
> __
> ^ no match for key
> __
> ab
> __
> ^ no match for key
> __
> aB
> __
> ^ no match for key
> __
>
>
> Or additionally, to limit your sorting to a locale that does not discard
> punctuation as unimportant, as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1
> -k 2
> sort: using simple byte comparison
> a
> _
> ^ no match for key
> _
> a/1.txt
> _
> _____
> _______
> a!
> __
> ^ no match for key
> __
> aB
> __
> ^ no match for key
> __
> ab
> __
> ^ no match for key
> __
>
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
--
Regards,
Peng