bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] regexp RS mangling input


From: Jay Michael
Subject: [bug-gawk] regexp RS mangling input
Date: Sun, 20 May 2012 01:05:52 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113

I'm using a regular expression as RS to soak up everything I don't want to see while parsing my input. I want the record terminator to include possibly multi-line expanses enclosed in braces.

The first problem I had, gawk seemed to be returning the same string for several consecutive internal records. When I tried to track down what I was doing wrong, my reduced test case caused gawk to include what should have been the first record in the first record's terminator, while ending the terminator before the end of the second "comment". Then, gawk acted like each character was a record terminator.

I'm running GNU Awk 3.1.3 under Windows XP. I don't know who built it, I don't remember where I got it. I tried on a UNIX/Linux shell to which I have access. It was running 3.1.1 (or so), it behaved the same way as the version on my PC.

I have attached my program (d.awk) and input (d.i). d.log is not really a log file -- I pasted pieces and then appended the output of
"gawk -f d.awk d.i".
function make_printable( s,   p )
{
  p = s ;
  gsub( /\n/, "\\n", p ) ;
  gsub( /\r/, "\\r", p ) ;
  gsub( /\t/, "\\t", p ) ;
  return p ;
} # make_printable

BEGIN {
  re_bcom = "\\{[^{}]*\\}" ;
    RS = "([ \\n]|(" re_bcom "))*" ;
    print "RS = " make_printable( RS ) ;
}

{
  print "RS = " make_printable(RS) \
        " $0 = " make_printable($0) " RT = " make_printable( RT ) ;
}
first
  {1st comment}

  {2nd comment}
last
GNU Awk 3.1.3


d.awk:
function make_printable( s,   p )
{
  p = s ;
  gsub( /\n/, "\\n", p ) ;
  gsub( /\r/, "\\r", p ) ;
  gsub( /\t/, "\\t", p ) ;
  return p ;
} # make_printable

BEGIN {
  re_bcom = "\\{[^{}]*\\}" ;
    RS = "([ \\n]|(" re_bcom "))*" ;
    print "RS = " make_printable( RS ) ;
}

{
  print "RS = " make_printable(RS) \
        " $0 = " make_printable($0) " RT = " make_printable( RT ) ;
}


d.i:
first
  {1st comment}

  {2nd comment}
last


gawk -f d.awk d.i
RS = ([ \n]|(\{[^{}]*\}))*
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = first\n  {1st comment}\n\n  {2nd comm
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = e
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = n
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = t
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = }
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = \n
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = l
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = a
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = s
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = t
RS = ([ \n]|(\{[^{}]*\}))* $0 =  RT = \n

reply via email to

[Prev in Thread] Current Thread [Next in Thread]