[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Case insensitivity seems to ignore lower bound of interval
From: |
Aharon Robbins |
Subject: |
Re: Case insensitivity seems to ignore lower bound of interval |
Date: |
Wed, 27 Apr 2011 21:48:41 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Greetings. Re the below.
First, thank you for the bug report.
Second, it's not a bug, but rather the consequence of how locales behave.
This is documented somewhat in the released gawk manual and documented better
in the upcoming one.
I do agree that the behavior is suprising, disconcerting, undesirable,
and so on. For this reason, the upcoming version of gawk translates
ranges of the form [d-h] into '[defgh]' before compiling the regular
expression.
You can check out the development version from the git repository
on savannah.gnu.org, if you like, to try it.
Thanks,
Arnold
> From: Eric Bischoff <address@hidden>
> To: address@hidden
> Subject: Case insensitivity seems to ignore lower bound of interval
> Date: Tue, 26 Apr 2011 17:27:49 +0200
> Cc: address@hidden, Nicolas Parpandet <address@hidden>
>
> Hi all,
>
>
> $ echo "ijklmnopqrstuvwxyz" | awk '{ gsub(/[R-Z}/, "X"); print }
> ijklmnopqrXXXXXXXX
>
> please notice that "r" is not matched, i.e. case insensitivity is applied
> only
> to [S-Z] interval.
>
> $ awk --version
> GNU Awk 3.1.7
> (...)
>
> $ echo $LANG
> fr_FR.UTF-8
>
> The problem does not appear when locale is C.
>
> The problem does not appear when interval is specified as [r-z] (lower case)..
>
> This contradicts http://www.gnu.org/software/gawk/manual/gawk.html#Locales
> which documents
> $ echo something1234abc | gawk '{ sub("[A-Z]*$", ""); print }'
> as returning
> something1234
> while it returns
> something1234a
>
> Bug reproduced both on Ubuntu Natty beta 2 and on Fedora 15.
>
>
> I hope that helps,
>
> --
> ?ric Bischoff - Bureau Cornavin
> Technical writing and translations
> http://www.bureau-cornavin.com
> (+33) 3 68 46 00 85
> sip:address@hidden