bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RS='.^' apparently ignores the RS setting


From: Ed Morton
Subject: Re: RS='.^' apparently ignores the RS setting
Date: Tue, 13 Jul 2021 07:42:05 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

Response below...

On 7/13/2021 6:46 AM, arnold@skeeve.com wrote:
Hi.

Ed Morton <mortoneccc@comcast.net> wrote:

So I read
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_09
which I thought was saying you could use any character before the `^`
and it wouldn't match which was supported by this test:

      $ printf 'ax^b\nax^b\n' | awk 'BEGIN{RS="x^"}{print NR, $0}'
      1 ax^b
      ax^b

but then I can't explain this where gawk is apparently completely
ignoring the RS setting:

      $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}'
      1 a.^b
      2 a.^b

Is that a bug?
No. ^ and $ are always metacharacters in EREs, even if that means
you can create nonsense regexps. You have to escape them to get
them to be treated literally:

$ printf 'ax^b\nax^b\n' | ./gawk 'BEGIN{RS="x\\^"}{print NR, $0}'
1 a
2 b
a
3 b

$ printf 'a.^b\na.^b\n' | ./gawk 'BEGIN{RS=".\\^"}{print NR, $0}'
1 a
2 b
a
3 b

HTH,

Arnold


I understand now that they are metachars in a ERE and I don't want them treated literally in this case.

The question is - why, given `RS='.^'` (a regexp that cannot match anywhere in the input), does gawk seem to ignore the RS and act as if I had `RS='\n'` when given `RS='x^'` (a different but similar regexp that also cannot match anywhere in the input) awk just reads the whole input in at once as you'd expect it to given a regexp that doesn't match the input?

$ printf 'ax^b\nax^b\n' | gawk 'BEGIN{RS="x^"}{print NR, $0}'
1 ax^b
ax^b

$ printf 'a.^b\na.^b\n' | gawk 'BEGIN{RS=".^"}{print NR, $0}'
1 a.^b
2 a.^b

Regards,

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]