[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort --ignore-case option changes underscore sort position
From: |
John Wiersba |
Subject: |
Re: sort --ignore-case option changes underscore sort position |
Date: |
Fri, 22 Aug 2008 12:56:04 -0400 |
Thanks for the quick and very clear explanation, Bob! I saw the
--ignore-case option definition, but the implications of it weren't
immediately apparent to me. It was especially confusing because I was
comparing with the output of a different tool which folds to lowercase when
doing comparisons and couldn't understand why there was a difference. Also,
the underscore character is particularly affected due to its heavy use in
filenames and program identifiers.
Maybe the documentation could be enhanced, something along the lines of:
The sort order of non-case-sensitive characters, such as punctuation, will
be affected if their sort order is different relative to lowercase and
uppercase characters. For example, in the C locale, the underscore
character sorts in between uppercase characters and lowercase characters,
causing the strings m and _ to sort differently with and without the
--ignore-case option.
On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx <address@hidden> wrote:
> ...
> `-f'
> `--ignore-case'
> Fold lowercase characters into the equivalent uppercase characters
> when comparing so that, for example, `b' and `B' sort as equal.
> The `LC_CTYPE' locale determines character types.
>
> Therefore your test case:
>
> { echo a_; echo ax; } | sort --ignore-case
>
> Is really the same as:
>
> $ { echo a_; echo ax; } | sort
> a_
> ax
>
> $ { echo A_; echo AX; } | sort
> AX
> A_
>
> $ { echo A_; echo AX; } | sort --ignore-case
> AX
> A_
>
> When using upper case you can see that it is equivalent to using the
> --ignore-case option. Perhaps this should have been more accurately
> called --convert-to-upper-case-before-sorting.
>
> The surprising part might be realizing that underscore collates
> between the upper and lower case letters when using the C/POSIX
> standard sort ordering. That is the standard legacy behavior. It
> does this along with [ \ ] ^ _ ` which all occur between Z and a in
> the US-ASCII code table.