[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort -V and accents
From: |
Jim Meyering |
Subject: |
Re: sort -V and accents |
Date: |
Wed, 01 Aug 2012 14:08:36 +0200 |
Pierre-Jean wrote:
> Jim Meyering <address@hidden> wrote:
>
>> Pierre-Jean wrote:
>
>> > I'm trying to sort a file containing accents and numbers,
>> > but can't find a way to do this correctly:
>
>
>> The trick is to specify sorting with "-f" for the first column
>> and "-V" for the second. Then it does what you seem to want:
>>
>> echo "
>> A 10
>> A 9
>> E 10
>> E 9
>> e 10
>> e 9
>> é 10
>> é 9
>> F 10
>> f 9" | sort -k1,1f -k2,2V
>>
>> A 9
>> A 10
>> e 9
>> E 9
>> e 10
>> E 10
>> é 9
>> é 10
>> f 9
>> F 10
>
> This is better, but still not perfect: "é 9" should be
> before "e 10", like "E 9" is before "e 10", as in a
> dictionnary, where "éa" is before "eb". That means that
> e=E=é=è=É=È if something after makes a différence.
>
> Look at this example:
>
> echo "
> é 9
> e 10
> éa
> eb
> E 9" | sort -k1,1f -k2,2V
>
> E 9
> e 10
> é 9
> éa
> eb
>
> "E 9" is correctly moved on first place, but the placement of
> "é 9" doesn't follow the same law.
>
> I probably can do with that "unperfect" order, because such
> situation are not frequent in real life, but if there's a
> solution, I'd be happy to know it.
I left out an important option: you should use -t' ', to tell sort
to use a space as field separator:
| sort -t' ' -k1,1f -k2,2V
If I'd used sort's --debug option, I would have had a hint about that problem.
Regarding accented and non-accented characters, I would not expect
sort's -f (--ignore-case) option to accomplish what you want.
Ignoring case does not ignore the presence of an accent.
However, there is another option, -d, --dictionary-order,
that you might think would be relevant. Unfortunately not,
since it merely tells sort to consider only blanks
and alphanumeric characters (ignoring all else).
I don't know how sort can do that.