[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #48055] Regex ranges and locales in gnu-awk regextype

From: Piotr Jurkiewicz
Subject: [bug #48055] Regex ranges and locales in gnu-awk regextype
Date: Mon, 30 May 2016 06:12:43 +0000 (UTC)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0


                 Summary: Regex ranges and locales in gnu-awk regextype
                 Project: findutils
            Submitted by: piotrjurkiewicz
            Submitted on: Mon 30 May 2016 08:12:40 AM CEST
                Category: find
                Severity: 3 - Normal
              Item Group: Wrong result
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 4.6.0
           Fixed Release: None



Starting with gawk 4.0 the traditional behaviour of regex ranges has been
brought back. This means that [a-z] matches only lowercase letters and [A-Z]
matches only uppercase letters, regardless of locale and collation being set.

See more:

Can test this with the following command:

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+

Findutils, however, still emulate the old behaviour of gawk in gnu-awk mode.
That is, when using certain locales, [a-z] and [A-Z] ranges matches both
lowercase and uppercase letters.



mkdir test
cd test
touch a.lower
touch b.UPPER

Then both commands:

LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$'
LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$'



instead just one file with appropriate case.


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]