bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] File partial overwrite when using join


From: Aharon Robbins
Subject: Re: [bug-gawk] File partial overwrite when using join
Date: Mon, 22 Feb 2016 06:25:30 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hi David.

I took a closer look at your code.

> Date: Fri, 19 Feb 2016 18:41:15 -0500
> From: David Niklas <address@hidden>
> To: address@hidden
> Subject: Re: [bug-gawk] File partial overwrite when using join
>
> On Fri, 5 Feb 2016 17:51:13 Wolfgang Laun wrote:
>
> > gawk is not to be blamed.
> > 
> > join requires the files to be sorted on the "key" field.

This is indeed true, and requires fixing, but it isn't the main problem.

You are mixing writes to the tmp-master.txt file from both
gawk and join.  In particular, gawk's idea of where the end of the file
is located is different from the actual end of file, since it doesn't
know that join has written to it.

>>>    printf("1 NEW PACKAGES\n") > ARGV[1];
>>>    print "2 REMOVED PACKAGES\n" >> ARGV[1];

These two lines are the crux of your problem. When you open the file
with ">" gawk truncates the file, writes to it, and then keeps the
file open until it's closed with a call to close(). The subsequent use
of ">>" does not force gawk to append to the file since it's already open.
gawk --lint reports:

gawk: aa-x.awk:37: warning: unnecessary mixing of `>' and `>>' for file 
`./tmp-master.txt'

If you change the first line to use ">>" then gawk opens the file
in append mode, so writes always happen at the end of the file:

        $ rm tmp-master.txt 
        $ gawk -f aa-x.awk 
        DEBUG: join'ing old ARGV(2) and new (5)
        app/zaz [ masked ]
        DEBUG: join'ing new (5) and old (2)
        app/zaz [ masked ]
        2 REMOVED PACKAGES

        app/bar

I believe that this is documented pretty clearly in the manual.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]