[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Is \x24 the literal dollar character?
From: |
Tom Gray |
Subject: |
Re: [bug-gawk] Is \x24 the literal dollar character? |
Date: |
Mon, 14 Oct 2019 20:06:48 +0000 |
You can also put the $ in a bracket expression.
gawk '/[$]Id: .*[$]/ {print}' <<< '$Id: rcsid$'
most metacharacters lose their special meaning inside the brackets
and you can avoid all the escaping.
Tom
-----Original Message-----
From: bug-gawk <bug-gawk-bounces+tom_gray=address@hidden> On Behalf Of
address@hidden
Sent: Saturday, October 12, 2019 7:46 PM
To: address@hidden; address@hidden
Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
[EXTERNAL]
Hi.
I looked into this. Please see the sidebar in that same section of the manual
that you cited:
| @sidebar Escape Sequences for Metacharacters @cindex metacharacters
| @subentry escape sequences for
|
| Suppose you use an octal or hexadecimal escape to represent a regexp
| metacharacter.
| (See @ref{Regexp Operators}.)
| Does @command{awk} treat the character as a literal character or as a
| regexp operator?
|
| @cindex dark corner @subentry escape sequences @subentry for
| metacharacters Historically, such characters were taken literally.
| @value{DARKCORNER}
| However, the POSIX standard indicates that they should be treated as
| real metacharacters, which is what @command{gawk} does.
| In compatibility mode (@pxref{Options}), @command{gawk} treats the
| characters represented by octal and hexadecimal escape sequences
| literally when used in regexp constants. Thus, @code{/a\52b/} is
| equivalent to @code{/a\*b/}.
| @end sidebar
In short, with --traditional, you'll get the behavior you're looking for.
Otherwise, gawk is following POSIX and treating such characters as real
metacharacters.
To solve your problem, you can do something like:
gawk '$0 ~ ("\\$" "Id: .*" "\\$") {print}' <<< '$Id: rcsid$'
HTH,
Arnold
"Kozics Peter (FM)" <address@hidden> wrote:
> Dear,
>
>
> (1)
> this matches:
> $ gawk '/\$Id: .*\$/ {print}' <<< '$Id: rcsid$'
> $Id: rcsid$
>
> (2)
> I expected that this would match as well, but it didn't:
> $ gawk '/\x24Id: .*\x24/ {print}' <<< '$Id: rcsid$'
>
> The expectation was based on gawk manual section 3.2: \x24 should be
> the literal dollar character, not the dollar metacharacter.
>
> (3)
> Now, let's go on, this does not match either:
> $ gawk '/\\x24Id: .*\\x24/ {print}' <<< '$Id: rcsid$'
>
> (4)
> And this one still not:
> $ gawk '/\\\x24Id: .*\\\x24/ {print}' <<< '$Id: rcsid$'
>
> (5)
> At long last, this matches again:
> $ gawk '/\x5c\x24Id: .*\x5c\x24/ {print}' <<< '$Id: rcsid$'
> $Id: rcsid$
>
> which looks to me awkward and quite counterintuitive.
>
> -------------
> The problem with (1) is that when the regexp is in a file under RCS
> control, RCS will destroy the regexp upon checkout by performing a
> keyword substitution. So the straightforward and seemingly manual-
> compliant solution would be (2), which is unfortunately not.
>
> I wonder if I found a gawk bug or a flaw in the regexp / literal /
> meta concept or a vague place in the gawk manual. Or just
> misunderstood something?
>
> -------------
> OS:
> $ uname -a
> Linux gygv 5.2.18-200.fc30.x86_64 #1 SMP Tue Oct 1 13:14:07 UTC 2019
> x86_64 x86_64 x86_64 GNU/Linux
>
> gawk:
> $ gawk --version
> GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2) Copyright
> (C) 1989, 1991-2018 Free Software Foundation.
>
> gawk manual:
> This is Edition 4.2 of GAWK: Effective AWK Programming
>
>
> yours
> KP
>