|
From: | Ed Morton |
Subject: | Re: RS='.^' apparently ignores the RS setting |
Date: | Tue, 13 Jul 2021 07:42:05 -0500 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
Response below... On 7/13/2021 6:46 AM, arnold@skeeve.com wrote:
Hi. Ed Morton <mortoneccc@comcast.net> wrote:So I read https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_09 which I thought was saying you could use any character before the `^` and it wouldn't match which was supported by this test: $ printf 'ax^b\nax^b\n' | awk 'BEGIN{RS="x^"}{print NR, $0}' 1 ax^b ax^b but then I can't explain this where gawk is apparently completely ignoring the RS setting: $ printf 'a.^b\na.^b\n' | awk 'BEGIN{RS=".^"}{print NR, $0}' 1 a.^b 2 a.^b Is that a bug?No. ^ and $ are always metacharacters in EREs, even if that means you can create nonsense regexps. You have to escape them to get them to be treated literally: $ printf 'ax^b\nax^b\n' | ./gawk 'BEGIN{RS="x\\^"}{print NR, $0}' 1 a 2 b a 3 b $ printf 'a.^b\na.^b\n' | ./gawk 'BEGIN{RS=".\\^"}{print NR, $0}' 1 a 2 b a 3 b HTH, Arnold
I understand now that they are metachars in a ERE and I don't want them treated literally in this case.
The question is - why, given `RS='.^'` (a regexp that cannot match anywhere in the input), does gawk seem to ignore the RS and act as if I had `RS='\n'` when given `RS='x^'` (a different but similar regexp that also cannot match anywhere in the input) awk just reads the whole input in at once as you'd expect it to given a regexp that doesn't match the input?
$ printf 'ax^b\nax^b\n' | gawk 'BEGIN{RS="x^"}{print NR, $0}' 1 ax^b ax^b $ printf 'a.^b\na.^b\n' | gawk 'BEGIN{RS=".^"}{print NR, $0}' 1 a.^b 2 a.^b Regards, Ed.
[Prev in Thread] | Current Thread | [Next in Thread] |