[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] regexp RS mangling input
From: |
Jay Michael |
Subject: |
[bug-gawk] regexp RS mangling input |
Date: |
Sun, 20 May 2012 01:05:52 -0400 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 |
I'm using a regular expression as RS to soak up everything I don't
want to see while parsing my input. I want the record terminator to
include possibly multi-line expanses enclosed in braces.
The first problem I had, gawk seemed to be returning the same
string for several consecutive internal records. When I tried to track
down what I was doing wrong, my reduced test case caused gawk to include
what should have been the first record in the first record's terminator,
while ending the terminator before the end of the second "comment".
Then, gawk acted like each character was a record terminator.
I'm running GNU Awk 3.1.3 under Windows XP. I don't know who
built it, I don't remember where I got it. I tried on a UNIX/Linux
shell to which I have access. It was running 3.1.1 (or so), it behaved
the same way as the version on my PC.
I have attached my program (d.awk) and input (d.i). d.log is not
really a log file -- I pasted pieces and then appended the output of
"gawk -f d.awk d.i".
function make_printable( s, p )
{
p = s ;
gsub( /\n/, "\\n", p ) ;
gsub( /\r/, "\\r", p ) ;
gsub( /\t/, "\\t", p ) ;
return p ;
} # make_printable
BEGIN {
re_bcom = "\\{[^{}]*\\}" ;
RS = "([ \\n]|(" re_bcom "))*" ;
print "RS = " make_printable( RS ) ;
}
{
print "RS = " make_printable(RS) \
" $0 = " make_printable($0) " RT = " make_printable( RT ) ;
}
first
{1st comment}
{2nd comment}
last
GNU Awk 3.1.3
d.awk:
function make_printable( s, p )
{
p = s ;
gsub( /\n/, "\\n", p ) ;
gsub( /\r/, "\\r", p ) ;
gsub( /\t/, "\\t", p ) ;
return p ;
} # make_printable
BEGIN {
re_bcom = "\\{[^{}]*\\}" ;
RS = "([ \\n]|(" re_bcom "))*" ;
print "RS = " make_printable( RS ) ;
}
{
print "RS = " make_printable(RS) \
" $0 = " make_printable($0) " RT = " make_printable( RT ) ;
}
d.i:
first
{1st comment}
{2nd comment}
last
gawk -f d.awk d.i
RS = ([ \n]|(\{[^{}]*\}))*
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = first\n {1st comment}\n\n {2nd comm
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = e
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = n
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = t
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = }
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = \n
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = l
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = a
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = s
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = t
RS = ([ \n]|(\{[^{}]*\}))* $0 = RT = \n
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug-gawk] regexp RS mangling input,
Jay Michael <=