bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 t


From: Koleti, Haritha
Subject: RE: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 to Win 2016
Date: Tue, 15 Jun 2021 16:58:28 +0000

Ed,  these inefficient scripts worked ~10 minutes  in 2008.  Do you think to 
address this(>90 mins on 2016) performance  we have to change all >100 AWK 
scripts?
Is there any other way that you can think of would be great.

Thanks
Haritha

From: Ed Morton <mortoneccc@comcast.net>
Sent: Tuesday, June 15, 2021 11:21 AM
To: Koleti, Haritha <Haritha.Koleti@pseg.com>; Eli Zaretskii <eliz@gnu.org>; 
arnold@skeeve.com
Cc: wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo 
<Ricardo_D.Pereira@pseg.com>; Pirane, Marco <Marco.Pirane@pseg.com>
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 
to Win 2016

Haritha - good. The REAL root cause of your problems is simply that the script 
was written extremely inefficiently. If you have any other scripts that take in 
the order of minutes to run given input files of the size you reported then 
those are also written extremely inefficiently and the fix is to correct those 
scripts to run efficiently, not try to tune the environment such that those 
scripts can run faster but still using enormous amounts of time like 10 
minutes. So I'd recommend just fixing whichever scripts you have that are 
taking minutes to run, if any.

    Ed.
On 6/15/2021 10:13 AM, Koleti, Haritha wrote:
Ed,

This worked like a charm <1 minute.  But we have  100s of scripts .   if would 
really help if we can find a root cause why this 10 minutes versus 90 minutes.

Thanks
Haritha


From: Ed Morton <mortoneccc@comcast.net><mailto:mortoneccc@comcast.net>
Sent: Tuesday, June 15, 2021 9:05 AM
To: Koleti, Haritha <Haritha.Koleti@pseg.com><mailto:Haritha.Koleti@pseg.com>; 
Eli Zaretskii <eliz@gnu.org><mailto:eliz@gnu.org>; 
arnold@skeeve.com<mailto:arnold@skeeve.com>
Cc: wolfgang.laun@gmail.com<mailto:wolfgang.laun@gmail.com>; 
bug-gawk@gnu.org<mailto:bug-gawk@gnu.org>; Pereira, Ricardo 
<Ricardo_D.Pereira@pseg.com><mailto:Ricardo_D.Pereira@pseg.com>; Pirane, Marco 
<Marco.Pirane@pseg.com><mailto:Marco.Pirane@pseg.com>
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 
to Win 2016

***CAUTION***

***CAUTION***

***CAUTION***

This e-mail is from an EXTERNAL address. The actual sender is 
(mortoneccc@comcast.net<mailto:mortoneccc@comcast.net>) which may be different 
from the display address in the From: field. Be cautious of clicking on links 
or opening attachments. Suspicious? Report it via the Report Phishing button. 
On mobile phones, forward message to Cyber Security.



David Kerns spotted a bug in that code (thanks), it should be:

BEGIN {
    FS=","
    while ( (getline<f2) > 0 ) {
        map[$2] = $1
    }
}
{
    sattr = ( $2 in map ? map[$2] : "" )
    printf("%s,%s,%s,%s,%s,%s,\n",$1,$2,$3,$4,$5,sattr);
}
On 6/15/2021 7:49 AM, Ed Morton wrote:
That script is enormously inefficient as it'll read the whole of Emp_attr.csv 
once per line of ParentChild.csv. Try changing it to (untested):

BEGIN {
    FS=","
    while ( (getline<f2) > 0 ) {
        map[$2] = $1
    }
}
{
    sattr = ( $2 in map : map[$2] : "" )
    printf("%s,%s,%s,%s,%s,%s,\n",$1,$2,$3,$4,$5,sattr);
}

and you should see a significant performance improvement (i.e. orders of 
magnitude). The only potential problem would be if Emp_attr.csv was too large 
to fit in memory.

    Ed.
On 6/15/2021 7:31 AM, Koleti, Haritha via Bug reports and all discussion about 
gawk. wrote:

Two more scripts that are used in the below script.



Emp_att.awk - I am not sending this as it is working fast.



Map_attr.awk  -



BEGIN {

FS=",";

}

{

t1=$2;

t0=$1;

t2=$3;

t3=$4;

t4=$5;

sattr="";

while( (getline<f2) > 0)

{

if ($2==t1)

{

sattr=$1;

}

}

close(f2);

printf("%s,%s,%s,%s,%s,%s,\n",t0,t1,t2,t3,t4,sattr);

}









[https://www.pseg.com/images/global/email/PSEG_emailsignature_PSEGw-tag_version2.png]<http://www.pseg.com><http://www.pseg.com>

[http://facebook.com/pseg 
[facebook.com]<https://urldefense.com/v3/__http:/facebook.com/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQbxBgAOtQ$>]<http://www.facebook.com/pseg>
 
[facebook.com]<https://urldefense.com/v3/__http:/www.facebook.com/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQZOceRnuA$>
        [Twitter] <http://www.twitter.com/psegdelivers> 
[twitter.com]<https://urldefense.com/v3/__http:/www.twitter.com/psegdelivers__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQZuBlqILQ$>
         [LinkedIn] <http://www.linkedin.com/company/pseg> 
[linkedin.com]<https://urldefense.com/v3/__http:/www.linkedin.com/company/pseg__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQaySLvb4g$>
       [https://www.pseg.com/images/global/WP_LOGOgrey.png] 
<http://energizepseg.com/> 
[energizepseg.com]<https://urldefense.com/v3/__http:/energizepseg.com/__;!!ITzsDw!70afyCdBQV-3sCoMPIl_aulJbbqhdB48vXq07x4KW1GP2ym9KKu37CX3cQbAjayTvg$>





PSEGSC

-----Original Message-----

From: Koleti, Haritha

Sent: Tuesday, June 15, 2021 7:49 AM

To: 'Eli Zaretskii' <eliz@gnu.org><mailto:eliz@gnu.org>; 
arnold@skeeve.com<mailto:arnold@skeeve.com>

Cc: wolfgang.laun@gmail.com<mailto:wolfgang.laun@gmail.com>; 
bug-gawk@gnu.org<mailto:bug-gawk@gnu.org>; Pereira, Ricardo 
<Ricardo_D.Pereira@pseg.com><mailto:Ricardo_D.Pereira@pseg.com>; Pirane, Marco 
<Marco.Pirane@pseg.com><mailto:Marco.Pirane@pseg.com>

Subject: RE: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 
to Win 2016



Good Morning Eli,



This is pretty straight forward script that is trying to Map the data between 
two files.  I am waiting on permission from our security team , so my 
team(Ricardo,Marco) can send you the details.



But here is the script .





@ECHO ON

SET DRIVENAME=D:

SET ROOTPATH=D:\PCM_SCRIPT\test

%DRIVENAME%

CD %ROOTPATH%

TYPE  ParentChild.csv|gawk -f Emp_Attr.awk>Emp_Attr.csv      ----> this is fast.

TYPE  ParentChild.csv|gawk -v f2=Emp_Attr.csv -f map_attr.awk>Map_Attr.csv  -> 
this is where it takes time.



complete script in old server win 2008(excel 2010) completes in 10 mins. now on 
new server 2016(excel 2016) takes 90 minutes.



There is NO change in the volume of data in 2 files .



Thanks

Haritha



-----Original Message-----

From: Eli Zaretskii <eliz@gnu.org><mailto:eliz@gnu.org>

Sent: Tuesday, June 15, 2021 7:30 AM

To: arnold@skeeve.com<mailto:arnold@skeeve.com>

Cc: wolfgang.laun@gmail.com<mailto:wolfgang.laun@gmail.com>; 
bug-gawk@gnu.org<mailto:bug-gawk@gnu.org>; Koleti, Haritha 
<Haritha.Koleti@pseg.com><mailto:Haritha.Koleti@pseg.com>

Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 
to Win 2016



***CAUTION******CAUTION******CAUTION***This e-mail is from an EXTERNAL address. 
 The actual sender is  (eliz@gnu.org<mailto:eliz@gnu.org>) which may be 
different from the display address in the From: field. Be cautious of clicking 
on links or opening attachments. Suspicious? Report it via the Report Phishing 
button.  On mobile phones, forward message to Cyber Security.



From: arnold@skeeve.com<mailto:arnold@skeeve.com>

Date: Tue, 15 Jun 2021 01:51:06 -0600

Cc: bug-gawk@gnu.org<mailto:bug-gawk@gnu.org>, 
Haritha.Koleti@pseg.com<mailto:Haritha.Koleti@pseg.com>



Wolfgang Laun <wolfgang.laun@gmail.com><mailto:wolfgang.laun@gmail.com> wrote:



The durations 10 min and 90 min suggest to me that a lot of i/o is

going on. I have experienced performance changes of a similar order

of magnitude due to changes in the default i/o buffer size.

-W

This is an interesting idea. Eli, what if you supply a binary built

with the following patch?

How does this theory explain the difference between the two Windows versions?  
They both use the same value of the "optimal" buffer size.



I'd rather see in the script how much I/O it really does, and take it from 
there.  Suppose that it turns out the script invokes other programs a lot, or 
does a lot of computations: then the investigation should go in some other 
direction, right?



The information contained in this e-mail, including any attachment(s), is 
intended solely for use by the named addressee(s). If you are not the intended 
recipient, or a person designated as responsible for delivering such messages 
to the intended recipient, you are not authorized to disclose, copy, distribute 
or retain this message, in whole or in part, without written authorization from 
PSEG. This e-mail may contain proprietary, confidential or privileged 
information. If you have received this message in error, please notify the 
sender immediately. This notice is included in all e-mail messages leaving 
PSEG. Thank you for your cooperation.

The information contained in this e-mail, including any attachment(s), is 
intended solely for use by the named addressee(s). If you are not the intended 
recipient, or a person designated as responsible for delivering such messages 
to the intended recipient, you are not authorized to disclose, copy, distribute 
or retain this message, in whole or in part, without written authorization from 
PSEG. This e-mail may contain proprietary, confidential or privileged 
information. If you have received this message in error, please notify the 
sender immediately. This notice is included in all e-mail messages leaving 
PSEG. Thank you for your cooperation.


The information contained in this e-mail, including any attachment(s), is 
intended solely for use by the named addressee(s). If you are not the intended 
recipient, or a person designated as responsible for delivering such messages 
to the intended recipient, you are not authorized to disclose, copy, distribute 
or retain this message, in whole or in part, without written authorization from 
PSEG. This e-mail may contain proprietary, confidential or privileged 
information. If you have received this message in error, please notify the 
sender immediately. This notice is included in all e-mail messages leaving 
PSEG. Thank you for your cooperation.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]