bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MacOS X: Redirection performance problem


From: Aharon Robbins
Subject: Re: MacOS X: Redirection performance problem
Date: Mon, 29 Sep 2008 07:15:55 +0000 (UTC)

Hi. You will get a little better performance by setting FS="," just once
in a BEGIN rule.  You can also leave out the call to close() and let
gawk manage the open file descriptors instead of opening and closing
the file for every record printed.

You might also try running with the environment variable LC_ALL=C to
see if that helps lower the cost of the regular expression matching
inside gsub.

If none of that helps, try compiling gawk for profiling and without
optimization and send me the output of gprof.

I have to wonder if you would get better numbers on a BSD Fast Filesystem
volume than on a MacOS HFS volume....

Thanks,

Arnold

In article <address@hidden>,
 <address@hidden> wrote:
>Hello,
>
>I'm facing a performance problem under MacOS X when using gawk's output 
>redirection: it's very slow.
>
>I have to process CSV files (~5G lines each) that must be splited into 
>separated files (~300) based on a field value, so performance is 
>critical. For now my old PIII outperforms my MacPro... so something 
>clearly isn't right somewhere under MacOS... Here what I'm using:
>
>{
>FS=","
>row=$0
>var=$5
>gsub(/\"/,"",var)
>path=dir"/"var".csv"
>print row >> path
>close(path)
>}
>
>Find below some simple test cases that compare performance of my MacPro 
>to an old IBM server. Any idea how the redirection could be optimized 
>under MacOS? I'm not a programmer but I can realize tests if necessary, 
>so please don't hesitate to ask... simply let me know exactly what you 
>want me to do.
>
>Best regards,
>
>Ben.
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin:
>
>$ awk -V
>awk version 20040207
>
>$ time awk '{ print > "/tmp/output.txt" }'  /tmp/input.txt
>real    0m12.071s
>user    0m5.171s
>sys    0m6.171s
>
>$ time awk '{ print }' < /tmp/input.txt  > /tmp/output.txt
>real    0m3.648s
>user    0m2.561s
>sys    0m0.665s
>
>-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null
>
>$ time awk '{ print > "/dev/null" }'  /tmp/input.txt
>real    0m7.068s
>user    0m4.752s
>sys    0m2.314s
>
>$ time awk '{ print }' < /tmp/input.txt  > /dev/null
>real    0m2.602s
>user    0m2.425s
>sys    0m0.177s
>
>
>$ wc -l /tmp/output.txt
>2000000 /tmp/output.txt
>$ wc -l /tmp/input.txt
>2000000 /tmp/input.txt
>$ ls -lh /tmp/output.txt
>-rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt
>
>
>-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built: 
>./configure --prefix=/usr/local/gawk-3.1.6) :
>
>$ /usr/local/gawk-3.1.6/bin/awk -W version
>GNU Awk 3.1.6
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}'  
>/tmp/input.txt
>
>real    0m6.657s
>user    0m3.968s
>sys    0m2.107s
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > 
>/tmp/output.txt
>
>real    0m6.475s
>user    0m3.757s
>sys    0m2.136s
>
>
>-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}'  
>/tmp/input.txt
>
>real    0m5.341s
>user    0m3.779s
>sys    0m1.561s
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null
>
>real    0m5.192s
>user    0m3.620s
>sys    0m1.570s
>
>
>Here an example with gawk 3.1.6 using an old IBM address@hidden server 
>running CentOS 5:
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' < 
>/tmp/input.txt
>
>real    0m3.334s
>user    0m2.184s
>sys    0m1.150s
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt > 
>/tmp/output.txt
>
>real    0m2.969s
>user    0m1.727s
>sys    0m1.243s
>
>-> IBM address@hidden using /dev/null
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt
>
>real    0m2.614s
>user    0m2.271s
>sys    0m0.343s
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null
>
>real    0m2.520s
>user    0m2.144s
>sys    0m0.358s
>
>
>
>


-- 
Aharon (Arnold) Robbins                                 arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381
Nof Ayalon              Cell Phone: +972 50  729-7545
D.N. Shimshon 99785     ISRAEL




reply via email to

[Prev in Thread] Current Thread [Next in Thread]