[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #48055] Regex ranges and locales in gnu-awk regextype
From: |
Piotr Jurkiewicz |
Subject: |
[bug #48055] Regex ranges and locales in gnu-awk regextype |
Date: |
Mon, 30 May 2016 06:12:43 +0000 (UTC) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 |
URL:
<http://savannah.gnu.org/bugs/?48055>
Summary: Regex ranges and locales in gnu-awk regextype
Project: findutils
Submitted by: piotrjurkiewicz
Submitted on: Mon 30 May 2016 08:12:40 AM CEST
Category: find
Severity: 3 - Normal
Item Group: Wrong result
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 4.6.0
Fixed Release: None
_______________________________________________________
Details:
Starting with gawk 4.0 the traditional behaviour of regex ranges has been
brought back. This means that [a-z] matches only lowercase letters and [A-Z]
matches only uppercase letters, regardless of locale and collation being set.
See more:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
Can test this with the following command:
$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0
ABC
$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+
[nothing]
Findutils, however, still emulate the old behaviour of gawk in gnu-awk mode.
That is, when using certain locales, [a-z] and [A-Z] ranges matches both
lowercase and uppercase letters.
Test:
Prepare:
mkdir test
cd test
touch a.lower
touch b.UPPER
Then both commands:
LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$'
LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$'
returns:
./a.lower
./b.UPPER
instead just one file with appropriate case.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?48055>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #48055] Regex ranges and locales in gnu-awk regextype,
Piotr Jurkiewicz <=