bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24601: Update


From: Assaf Gordon
Subject: bug#24601: Update
Date: Tue, 4 Oct 2016 23:29:41 -0400

Hello,

Attached is my test program for strcoll.
It is slightly big, but mostly because of additional help and debug printing.

This will also become relevant when we deal with multibyte sort/join/uniq later 
on.

From brief testing, it seems glibc with UTF-8 locales is the only libc that has 
special collation order for non-letters/punctuation characters.
As Andreas Schwab explained:
  "They are not ignored, just considered only secondary, if the first order
   characters didn't provide an ordering.".
  http://lists.gnu.org/archive/html/bug-coreutils/2016-06/msg00005.html

Interestingly, "ja_JP.UTF-8" is the only locale in glibc that uses a different 
order than all other UTF-8 locales, and its ordering is more "intuitive" 
(closer to ascii).

Testing results from various systems are below.

The program is also available for download here:
  wget http://files.housegordon.org/src/strcoll-test.c
Compilation is trivial:
  cc -o strcoll-test strcoll-test.c

Usage is:
    Usage: ./strcoll-test [-svl] [[-KM] | TEXT1 TEXT2 TEXTn...] 
    
    Sorts TEXT1 TEXT2 TEXTn... according to the currently
    active locale using strcoll(3) call.
    If no parameters are specified, assumes '-M' instead.
    
    Options:
     -s   print result of each strcoll(3) call
     -v   print uname/glibc version information (if available)
     -l   print active local name
    
     -K   use input from https://debbugs.gnu.org/23677
     -M   use input from https://debbugs.gnu.org/24601 (default)
    
    Use LC_ALL to set locale.
    
    Examples:
    
      $ ./strcoll-test -ls '!a' '$z' '#c'
      active locale: en_US.UTF-8
      strcoll('$z', '#c') = 23
      strcoll('!a', '#c') = -2
      !a
      #c
      $z


Comments welcomed,
 - assaf


Attachment: strcoll-test.c
Description: Binary data


====

### Ubuntu 14.04 with glibc 2.19

$ ./strcoll-test -svl -M                                                        
               
Linux 3.13.0-88-generic glibc 2.19 stable
active locale: en_US.UTF-8
strcoll('+00', '-0c') = -12
strcoll('+02', '-02') = 62
strcoll('+00', '-02') = -2
strcoll('-0c', '-02') = 10
strcoll('-0c', '+02') = 10
+00
-02
+02
-0c

### Fedora 24

$ ./strcoll-test -svl -M
Linux 4.6.3-300.fc24.x86_64 glibc 2.23 stable
active locale: en_US.UTF-8
strcoll('+00', '-0c') = -12
strcoll('+02', '-02') = 62
strcoll('+00', '-02') = -2
strcoll('-0c', '-02') = 10
strcoll('-0c', '+02') = 10
+00
-02
+02
-0c


### Glibc with locale ja_JP.UTF-8 - not the same collation as other UTF-8 
locales

$ LC_ALL=ja_JP.UTF-8 ./strcoll-test-glibc -svl -M
Linux 3.13.0-88-generic glibc 2.19 stable
active locale: ja_JP.UTF-8
strcoll('+00', '-0c') = -2
strcoll('+02', '-02') = -2
strcoll('+00', '+02') = -2
strcoll('-0c', '+02') = 2
strcoll('-0c', '-02') = 49
+00
+02
-02
-0c


### Musl Libc 1.15 on Ubuntu 14.04

$ ./strcoll-test-musl -svl -M                                                   
                     
Linux 3.13.0-88-generic not-glibc
active locale: 
en_US.UTF-8;en_US.UTF-8;en_US.UTF-8;en_US.UTF-8;en_US.UTF-8;en_US.UTF-8
strcoll('+02', '+00') = 2
strcoll('+02', '-0c') = -2
strcoll('+00', '-0c') = -2
strcoll('-0c', '-02') = 49
strcoll('-02', '+00') = 2
strcoll('-02', '+02') = 2
strcoll('+00', '+02') = -2
+00
+02
-02
-0c


### OpenBSD 6.0

$ LC_ALL=en_US.UTF-8 ./strcoll-test -svl -M
OpenBSD 6.0 not-glibc
active locale: C/en_US.UTF-8/C/C/C/en_US.UTF-8
strcoll('+00', '-0c') = -2
strcoll('-0c', '+02') = 2
strcoll('+00', '+02') = -2
strcoll('-0c', '-02') = 49
strcoll('+02', '-02') = -2
+00
+02
-02
-0c


### FreeBSD 10.3

$ LC_ALL=en_US.UTF-8 ./strcoll-test -svl -M
FreeBSD 10.3-RELEASE not-glibc
active locale: en_US.UTF-8
strcoll('+00', '-0c') = -2
strcoll('-0c', '+02') = 2
strcoll('+00', '+02') = -2
strcoll('-0c', '-02') = 49
strcoll('+02', '-02') = -2
+00
+02
-02
-0c


### Mac OS X 10.10.4

$ ./strcoll-test -svl -M
Darwin 14.4.0 not-glibc
active locale: en_US.UTF-8
strcoll('+00', '-0c') = -2
strcoll('-0c', '+02') = 2
strcoll('+00', '+02') = -2
strcoll('-0c', '-02') = 49
strcoll('+02', '-02') = -2
+00
+02
-02
-0c



## OpenSolaris 5.11/x86
## strange exception where the collation order of '+' and '-' is reversed.

$ ./strcoll-test -sl -M
active locale: en_US.UTF-8
strcoll('+00', '-0c') = 461
strcoll('+00', '+02') = -39
strcoll('+02', '-02') = 461
strcoll('+00', '-02') = 461
strcoll('-0c', '-02') = 219
-02
-0c
+00
+02







reply via email to

[Prev in Thread] Current Thread [Next in Thread]