bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU awk ver. 3.1.6 OFS-related bug


From: George Zarkadas
Subject: GNU awk ver. 3.1.6 OFS-related bug
Date: Tue, 08 Mar 2011 01:18:28 +0200

Hello,

I stumbed upon this trying to do some speed comparisons with awk and sed
regarding the parsing of dpkg flat-database files. This will be a long
message since I cannot attach files. Report format:

0. This introduction
1. Bug Summary and notes
2. My configuration
3. An annotated log of the activities used to reproduce and verify
   the bug.
4. Source code of the programs used

The sample dpkg file upon which the tests were run is also available 
if you need it.

regards
George Zarkadas


1. Bug summary:
=================================================================

Awk does not honor OFS assignments in the BEGIN block (when FS is also
previously set), unless a statement inside the applied rule's action
forces record reconstitution. 

If this doesn't happen OFS == FS (effectively). Since the initial value
of OFS is " " according to documentation, this means it does change, but
for some reason either to an incorrect value or it is not applied.

Note: I have done many OFS assignments in the past without noticing
something similar, so it is possible that the described behaviour arises
only when octal/hexadecimal strings are used for assigning OFS.


2. My configuration:
=================================================================

`awk --version` output (truncated):
        GNU Awk 3.1.6
        Copyright (C) 1989, 1991-2007 Free Software Foundation.
gawk package's version:
        1:3.1.6.dfsg-0ubuntu2
Distribution:   
        Ubuntu 9.10 (Karmic)
`uname -a` output:
        Linux laptop 2.6.31-21-core2 #59 SMP PREEMPT Thu Apr 8 23:37:45
        EEST 2010 i686 GNU/Linux


3. Log of activities used to reproduce and verify the bug:
=================================================================

Runned:
        mom-sed
        # "reference" implementation
        diff -U0 /var/lib/dpkg/available ./sed-out.txt
Result:
        diff showed only a missing empty line at the end of the file
        which is acceptable for our test case
Runned:
        mom-awk
        # initial awk script
        diff -U0 ./sed-med.txt ./awk-med.txt
Result: 
        packages casper, akiranews, cinelerra-swtc remained unchanged
        regarding FS (RS was set correctly)

        Runned:
                diff -U0 ./awk-med.txt ./awk-out.txt | less
        Result: 
                Description field was made multiline, as expected (gsub
                worked); \177 was not converted to pair of newlines

        Runned:
                single-line-desc.awk (prints only packages with 1-line 
description)
        Result:
                Program output is:
-----------------------------------
2584    Package: casper
3769    Package: akiradnews
5176    Package: cinelerra-swtc
-----------------------------------

This indicates the problem is in single-line "Description" field
records. Specifically: 
-- "$i =..." assignment is not performed and thus the record
    is not reconstituted during the loop.
-- gsub seems to not reconstitute the record either
-- the record does not get reformated with OFS (only ORS) when "print" 
   is used

Runned:
        mom-awk.2
        # uses an auxiliary string for storing record's output before 
        # print for the first subprogram; the second remained unchanged
Result:
        intermediate file was ok, reconstituted output not

This indicates (since the action consists from only a gsub and a print)
that previous assessment is valid.

Runned:
        mom-awk.3
        # uses explicit gsub for performing the change of FS character
        # in subprogram 2
        diff -U0 /var/lib/dpkg/available ./awk-3-out.txt
Result:
        Files were identical.

Runned:
        mom-awk.4
        # same as mom-awk, but issues a "$1 = $1" to force 
        # record reconstitution
        diff -U0 /var/lib/dpkg/available ./awk-4-out.txt
Result:
        Files were identical.


4. Source code of the programs used
=================================================================
Note - if anyone ever is interested :)
All code is licenced GPL Ver. 3 or (at your option) any later
as published by the Free Software Foundation.

4.1     mom-sed
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./sed-out.txt'
MED='./sed-med.txt'

<${INPUT} sed '
# after N we have either <field>\n{\n} or <field>\n<field> or
{\n}\n<field>
# (1) or (3) signify a record switch; remove \n and (1): print, (3):hold
# in (2) change \n to either FS or INTRA_FS and hold
:read
N
/\n$/   {
        s/\n$//
        p
        d
}
/.\n./  {
        s/\n /\x1E /
        s/\n\([^ ]\)/\x7F\1/
        b read
}
/^\n/   {
        s/^\n//
        b read
}
' >${MED}

<${MED} sed '
# in lines after the first we add a newline at the start
1 s/\x7F/\n/g
2,$ {
        s/^./\n&/
        s/\x7F/\n/g
}
s/\x1E /\n /g
' > ${OUT}
--------------------------------------

4.2     mom-awk
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./awk-out.txt'
MED='./awk-med.txt'

<${INPUT} awk '
BEGIN {
        RS=""
        FS="\n"
        ORS="\n"
        OFS="\177"
}
{
        for (i = 1; i <= NF; ++i)
          {
                if (substr($i, 1, 1) == " ")
                        $i="\177" $i
          }
        gsub(/\177\177 /, "\036 ")
        print
}' >${MED}

<${MED} awk '
BEGIN {
        RS="\n"
        FS="\177"
        ORS="\n\n"
        OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
        gsub("\036 ", "\n ")
        print
}' > ${OUT}
--------------------------------------

4.3     mom-awk.2
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./awk-2-out.txt' 
MED='./awk-2-med.txt'

<${INPUT} awk '
BEGIN {
        RS=""
        FS="\n"
        ORS="\n"
        OFS="\177"
}
{
# here we use a separate string buffer instead of gsub
        out = $1
        for (i = 2; i <= NF; ++i)
          {
                if (substr($i, 1, 1) == " ")
                        out = out "\036" $i
                else
                        out = out "\177" $i
          }
        print out
}' >${MED}

# we keep the same code as mom-awk here
# neither "\x7F" nor "\\177" for FS work
<${MED} awk '
BEGIN {
        RS="\n"
        FS="\177" 
        ORS="\n\n"
        OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
        gsub("\036 ", "\n ")
        print
}' > ${OUT}
--------------------------------------

4.4     mom-awk.3
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./awk-3-out.txt' 
MED='./awk-3-med.txt'

<${INPUT} awk '
BEGIN {
        RS=""
        FS="\n"
        ORS="\n"
        OFS="\177"
}
{
# here we use a separate string buffer instead of gsub
        out = $1
        for (i = 2; i <= NF; ++i)
          {
                if (substr($i, 1, 1) == " ")
                        out = out "\036" $i
                else
                        out = out "\177" $i
          }
        print out
}' >${MED}

# we change the code here
# neither "\177" nor "\x7F" nor "\\177" for FS work
<${MED} awk '
BEGIN {
        RS="\n"
        FS=""
        ORS="\n\n"
        OFS="\n"
}
{
        gsub("\036 ", "\n ")
# next line added when switched to FS=""
        gsub("\177", "\n")
        print
}' > ${OUT}
--------------------------------------

4.5     mom-awk.4
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./awk-4-out.txt'
MED='./awk-4-med.txt'

<${INPUT} awk '
BEGIN {
        RS=""
        FS="\n"
        ORS="\n"
        OFS="\177"
}
{
        for (i = 1; i <= NF; ++i)
          {
                if (substr($i, 1, 1) == " ")
                        $i="\177" $i
          }
        gsub(/\177\177 /, "\036 ")
        $1 = $1
        print
}' >${MED}

<${MED} awk '
BEGIN {
        RS="\n"
        FS="\177"
        ORS="\n\n"
        OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
        gsub("\036 ", "\n ")
        $1 = $1
        print
}' > ${OUT}
--------------------------------------

4.6     single-line-desc.awk
--------------------------------------
#!/bin/sh

INPUT=/var/lib/dpkg/available
OUT='./singleline-NR.txt'

<${INPUT} awk '
BEGIN {
        RS=""
        FS="\n"
        ORS="\n"
        OFS=""
}
{
        multi = 0
        for (i = 1; i <= NF; ++i)
          {
                if (substr($i, 1, 1) == " ")
                  {
                        multi = 1
                        break
                  }
          }
        if (!multi)
                print NR "\t" $1
}' >${OUT}
--------------------------------------

Attachment: signature.asc
Description: Αυτό το σημείο του μηνύματος είναι ψηφιακά υπογεγραμμ ένο


reply via email to

[Prev in Thread] Current Thread [Next in Thread]