[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MacOS X: Redirection performance problem
From: |
Aharon Robbins |
Subject: |
Re: MacOS X: Redirection performance problem |
Date: |
Mon, 29 Sep 2008 07:15:55 +0000 (UTC) |
Hi. You will get a little better performance by setting FS="," just once
in a BEGIN rule. You can also leave out the call to close() and let
gawk manage the open file descriptors instead of opening and closing
the file for every record printed.
You might also try running with the environment variable LC_ALL=C to
see if that helps lower the cost of the regular expression matching
inside gsub.
If none of that helps, try compiling gawk for profiling and without
optimization and send me the output of gprof.
I have to wonder if you would get better numbers on a BSD Fast Filesystem
volume than on a MacOS HFS volume....
Thanks,
Arnold
In article <address@hidden>,
<address@hidden> wrote:
>Hello,
>
>I'm facing a performance problem under MacOS X when using gawk's output
>redirection: it's very slow.
>
>I have to process CSV files (~5G lines each) that must be splited into
>separated files (~300) based on a field value, so performance is
>critical. For now my old PIII outperforms my MacPro... so something
>clearly isn't right somewhere under MacOS... Here what I'm using:
>
>{
>FS=","
>row=$0
>var=$5
>gsub(/\"/,"",var)
>path=dir"/"var".csv"
>print row >> path
>close(path)
>}
>
>Find below some simple test cases that compare performance of my MacPro
>to an old IBM server. Any idea how the redirection could be optimized
>under MacOS? I'm not a programmer but I can realize tests if necessary,
>so please don't hesitate to ask... simply let me know exactly what you
>want me to do.
>
>Best regards,
>
>Ben.
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin:
>
>$ awk -V
>awk version 20040207
>
>$ time awk '{ print > "/tmp/output.txt" }' /tmp/input.txt
>real 0m12.071s
>user 0m5.171s
>sys 0m6.171s
>
>$ time awk '{ print }' < /tmp/input.txt > /tmp/output.txt
>real 0m3.648s
>user 0m2.561s
>sys 0m0.665s
>
>-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with default bin using /dev/null
>
>$ time awk '{ print > "/dev/null" }' /tmp/input.txt
>real 0m7.068s
>user 0m4.752s
>sys 0m2.314s
>
>$ time awk '{ print }' < /tmp/input.txt > /dev/null
>real 0m2.602s
>user 0m2.425s
>sys 0m0.177s
>
>
>$ wc -l /tmp/output.txt
>2000000 /tmp/output.txt
>$ wc -l /tmp/input.txt
>2000000 /tmp/input.txt
>$ ls -lh /tmp/output.txt
>-rw-rw-r-- 1 abc abc 129M Sep 21 00:58 /tmp/output.txt
>
>
>-> MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 (Built:
>./configure --prefix=/usr/local/gawk-3.1.6) :
>
>$ /usr/local/gawk-3.1.6/bin/awk -W version
>GNU Awk 3.1.6
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/tmp/output.txt"}'
>/tmp/input.txt
>
>real 0m6.657s
>user 0m3.968s
>sys 0m2.107s
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt >
>/tmp/output.txt
>
>real 0m6.475s
>user 0m3.757s
>sys 0m2.136s
>
>
>-- MacOS 10.5.5 on a MacPro (Xeon 2.8Ghz) with gawk 3.1.6 using /dev/null
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print > "/dev/null"}'
>/tmp/input.txt
>
>real 0m5.341s
>user 0m3.779s
>sys 0m1.561s
>
>$ time /usr/local/gawk-3.1.6/bin/awk '{ print }' /tmp/input.txt > /dev/null
>
>real 0m5.192s
>user 0m3.620s
>sys 0m1.570s
>
>
>Here an example with gawk 3.1.6 using an old IBM address@hidden server
>running CentOS 5:
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print > "/tmp/output.txt" }' <
>/tmp/input.txt
>
>real 0m3.334s
>user 0m2.184s
>sys 0m1.150s
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print }' < /tmp/input.txt >
>/tmp/output.txt
>
>real 0m2.969s
>user 0m1.727s
>sys 0m1.243s
>
>-> IBM address@hidden using /dev/null
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print > "/dev/null" }' /tmp/input.txt
>
>real 0m2.614s
>user 0m2.271s
>sys 0m0.343s
>
>$ time /usr/src/gawk-3.1.6/gawk '{ print }' /tmp/input.txt > /dev/null
>
>real 0m2.520s
>user 0m2.144s
>sys 0m0.358s
>
>
>
>
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL