[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ver. 3.1.4 & 3.1.3 Windows ports: Chopped record count at large files

From: George Zarkadas
Subject: Ver. 3.1.4 & 3.1.3 Windows ports: Chopped record count at large files
Date: Sun, 27 Aug 2006 13:19:08 +0300

gawk reports a (very) smaller than actual record count when processing a
large (~ 275 MB) text file.


This behavior exists in:

3.1.4 version, xmlgawk windows port

    (downloaded from

3.1.3 version, gnuwin32 windows port

    (downloaded from http://sourceforge.net/projects/gnuwin32/ )


but not in the 3.0.4 version (mingw windows port) which gives the correct
results (as verified by independent checks).


As a consequence and in consistency with the above remark, gawk fails to
extract a subset of records from the file that are located near the end of


Attached are included:

1. Results (as copied and pasted from the command line) from (a) running the
count scripts and (b) extracting the subset [files: count_results.txt and

2. The awk scripts in question


Additional information

-- The file upon which the scripts operated contains bibliographic records
in bibtex format (converted from the xml file which is supplied by the DBLP
project as downloaded from www.vldb.org <http://www.vldb.org/>  )

-- The scripts were run on two machines with identical results.

OS: Windows XP SP2 (EL) in both

CPU: Pentium M 1.7 GHz  |  Pentium 4 HT 3.0 GHz

RAM: 1 GB  |  2 GB

HDD: 80 GB  |  400 GB

-- A bug report has also been submitted to the gnuwin32 project (no related
contact-info was found for the 3.1.4 port). However I have the feeling that
this is not a windows-port specific behavior; hence this bug report.


Kind Regards

George Zarkadas


PS: The original file upon which the scripts acted is not included because
of its size (~55 MB zipped) but will be happily supplied if requested.


Attachment: count_results.txt
Description: Text document

Attachment: count_dblp_bib2.awk
Description: Binary data

Attachment: count_dblp_bib.awk
Description: Binary data

Attachment: subset_results.txt
Description: Text document

Attachment: get_vldb_subset.awk
Description: Binary data

Attachment: get_vldb_subset2.awk
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]