[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GNU awk ver. 3.1.6 OFS-related bug
From: |
George Zarkadas |
Subject: |
GNU awk ver. 3.1.6 OFS-related bug |
Date: |
Tue, 08 Mar 2011 01:18:28 +0200 |
Hello,
I stumbed upon this trying to do some speed comparisons with awk and sed
regarding the parsing of dpkg flat-database files. This will be a long
message since I cannot attach files. Report format:
0. This introduction
1. Bug Summary and notes
2. My configuration
3. An annotated log of the activities used to reproduce and verify
the bug.
4. Source code of the programs used
The sample dpkg file upon which the tests were run is also available
if you need it.
regards
George Zarkadas
1. Bug summary:
=================================================================
Awk does not honor OFS assignments in the BEGIN block (when FS is also
previously set), unless a statement inside the applied rule's action
forces record reconstitution.
If this doesn't happen OFS == FS (effectively). Since the initial value
of OFS is " " according to documentation, this means it does change, but
for some reason either to an incorrect value or it is not applied.
Note: I have done many OFS assignments in the past without noticing
something similar, so it is possible that the described behaviour arises
only when octal/hexadecimal strings are used for assigning OFS.
2. My configuration:
=================================================================
`awk --version` output (truncated):
GNU Awk 3.1.6
Copyright (C) 1989, 1991-2007 Free Software Foundation.
gawk package's version:
1:3.1.6.dfsg-0ubuntu2
Distribution:
Ubuntu 9.10 (Karmic)
`uname -a` output:
Linux laptop 2.6.31-21-core2 #59 SMP PREEMPT Thu Apr 8 23:37:45
EEST 2010 i686 GNU/Linux
3. Log of activities used to reproduce and verify the bug:
=================================================================
Runned:
mom-sed
# "reference" implementation
diff -U0 /var/lib/dpkg/available ./sed-out.txt
Result:
diff showed only a missing empty line at the end of the file
which is acceptable for our test case
Runned:
mom-awk
# initial awk script
diff -U0 ./sed-med.txt ./awk-med.txt
Result:
packages casper, akiranews, cinelerra-swtc remained unchanged
regarding FS (RS was set correctly)
Runned:
diff -U0 ./awk-med.txt ./awk-out.txt | less
Result:
Description field was made multiline, as expected (gsub
worked); \177 was not converted to pair of newlines
Runned:
single-line-desc.awk (prints only packages with 1-line
description)
Result:
Program output is:
-----------------------------------
2584 Package: casper
3769 Package: akiradnews
5176 Package: cinelerra-swtc
-----------------------------------
This indicates the problem is in single-line "Description" field
records. Specifically:
-- "$i =..." assignment is not performed and thus the record
is not reconstituted during the loop.
-- gsub seems to not reconstitute the record either
-- the record does not get reformated with OFS (only ORS) when "print"
is used
Runned:
mom-awk.2
# uses an auxiliary string for storing record's output before
# print for the first subprogram; the second remained unchanged
Result:
intermediate file was ok, reconstituted output not
This indicates (since the action consists from only a gsub and a print)
that previous assessment is valid.
Runned:
mom-awk.3
# uses explicit gsub for performing the change of FS character
# in subprogram 2
diff -U0 /var/lib/dpkg/available ./awk-3-out.txt
Result:
Files were identical.
Runned:
mom-awk.4
# same as mom-awk, but issues a "$1 = $1" to force
# record reconstitution
diff -U0 /var/lib/dpkg/available ./awk-4-out.txt
Result:
Files were identical.
4. Source code of the programs used
=================================================================
Note - if anyone ever is interested :)
All code is licenced GPL Ver. 3 or (at your option) any later
as published by the Free Software Foundation.
4.1 mom-sed
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./sed-out.txt'
MED='./sed-med.txt'
<${INPUT} sed '
# after N we have either <field>\n{\n} or <field>\n<field> or
{\n}\n<field>
# (1) or (3) signify a record switch; remove \n and (1): print, (3):hold
# in (2) change \n to either FS or INTRA_FS and hold
:read
N
/\n$/ {
s/\n$//
p
d
}
/.\n./ {
s/\n /\x1E /
s/\n\([^ ]\)/\x7F\1/
b read
}
/^\n/ {
s/^\n//
b read
}
' >${MED}
<${MED} sed '
# in lines after the first we add a newline at the start
1 s/\x7F/\n/g
2,$ {
s/^./\n&/
s/\x7F/\n/g
}
s/\x1E /\n /g
' > ${OUT}
--------------------------------------
4.2 mom-awk
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./awk-out.txt'
MED='./awk-med.txt'
<${INPUT} awk '
BEGIN {
RS=""
FS="\n"
ORS="\n"
OFS="\177"
}
{
for (i = 1; i <= NF; ++i)
{
if (substr($i, 1, 1) == " ")
$i="\177" $i
}
gsub(/\177\177 /, "\036 ")
print
}' >${MED}
<${MED} awk '
BEGIN {
RS="\n"
FS="\177"
ORS="\n\n"
OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
gsub("\036 ", "\n ")
print
}' > ${OUT}
--------------------------------------
4.3 mom-awk.2
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./awk-2-out.txt'
MED='./awk-2-med.txt'
<${INPUT} awk '
BEGIN {
RS=""
FS="\n"
ORS="\n"
OFS="\177"
}
{
# here we use a separate string buffer instead of gsub
out = $1
for (i = 2; i <= NF; ++i)
{
if (substr($i, 1, 1) == " ")
out = out "\036" $i
else
out = out "\177" $i
}
print out
}' >${MED}
# we keep the same code as mom-awk here
# neither "\x7F" nor "\\177" for FS work
<${MED} awk '
BEGIN {
RS="\n"
FS="\177"
ORS="\n\n"
OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
gsub("\036 ", "\n ")
print
}' > ${OUT}
--------------------------------------
4.4 mom-awk.3
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./awk-3-out.txt'
MED='./awk-3-med.txt'
<${INPUT} awk '
BEGIN {
RS=""
FS="\n"
ORS="\n"
OFS="\177"
}
{
# here we use a separate string buffer instead of gsub
out = $1
for (i = 2; i <= NF; ++i)
{
if (substr($i, 1, 1) == " ")
out = out "\036" $i
else
out = out "\177" $i
}
print out
}' >${MED}
# we change the code here
# neither "\177" nor "\x7F" nor "\\177" for FS work
<${MED} awk '
BEGIN {
RS="\n"
FS=""
ORS="\n\n"
OFS="\n"
}
{
gsub("\036 ", "\n ")
# next line added when switched to FS=""
gsub("\177", "\n")
print
}' > ${OUT}
--------------------------------------
4.5 mom-awk.4
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./awk-4-out.txt'
MED='./awk-4-med.txt'
<${INPUT} awk '
BEGIN {
RS=""
FS="\n"
ORS="\n"
OFS="\177"
}
{
for (i = 1; i <= NF; ++i)
{
if (substr($i, 1, 1) == " ")
$i="\177" $i
}
gsub(/\177\177 /, "\036 ")
$1 = $1
print
}' >${MED}
<${MED} awk '
BEGIN {
RS="\n"
FS="\177"
ORS="\n\n"
OFS="\n"
}
# we only need to replace \036; setting OFS deals with \177
{
gsub("\036 ", "\n ")
$1 = $1
print
}' > ${OUT}
--------------------------------------
4.6 single-line-desc.awk
--------------------------------------
#!/bin/sh
INPUT=/var/lib/dpkg/available
OUT='./singleline-NR.txt'
<${INPUT} awk '
BEGIN {
RS=""
FS="\n"
ORS="\n"
OFS=""
}
{
multi = 0
for (i = 1; i <= NF; ++i)
{
if (substr($i, 1, 1) == " ")
{
multi = 1
break
}
}
if (!multi)
print NR "\t" $1
}' >${OUT}
--------------------------------------
signature.asc
Description: Αυτό το σημείο του μηνύματος είναι ψηφιακά υπογεγραμμ ένο
- GNU awk ver. 3.1.6 OFS-related bug,
George Zarkadas <=