|
From: | Hermann Peifer |
Subject: | Re: match finds wrong space. |
Date: | Thu, 08 Jul 2010 10:17:03 +0200 |
User-agent: | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 |
On 07/07/2010 20:36, Davide Brini wrote:
On Wed, 07 Jul 2010 21:34:25 +0300 Aharon Robbins<address@hidden> wrote:regards - Chris Willis in the UK# insjk.awk BEGIN { s = "Mary Ann jane" n = match( s, /\040[a-z]/ ) print n, s }Hi. Current gawk is correct, and 3.0.3 is wrong. You'll note that following the \040 for a space you have [a-z]. This matches *lower case letters*; the "A" following the first first is an upper case letter.But it's matched in his example.So, there's no bug.He is saying that match( s, /\040[a-z]/ ) on the line "Mary Ann jane" gives 5 (meaning [a-z] matches the "A"), whereas it should give 9. I explained the reason for that in my post.
Davide,Soemone else already explained that this is expected behaviour. Unless your are in C locale, the character range [a-z] can expanded to just about anything. Simplified examples are:
aBbCc...XxYyZz or aAbBcC...xXyYzYour locale is probably similar to the latter example, this is why it matches an uppercase A. In non-C locales, use character classes like [:lower:] and [:upper:] instead of character ranges like [a-z] and [A-Z].
Hermann
[Prev in Thread] | Current Thread | [Next in Thread] |