bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 t


From: Andrew J. Schorr
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 to Win 2016
Date: Wed, 16 Jun 2021 08:39:23 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Ed,

That sounds right to me. As you point out, map_attr.awk produces precisely
one line of output for each line of input. So the command:

gawk -v f2=Emp_Attr.csv -f map_attr.awk ParentChild.csv>Map_Attr.csv

should produce a Map_Attr.csv file that has exactly the same number
of records as the ParentChild.csv file. There must have been a cut & paste
copy error. 

Haritha -- can you please try again, taking care to make sure that the
command is copied exactly as written above?

Regards,
Andy

On Wed, Jun 16, 2021 at 07:33:50AM -0500, Ed Morton wrote:
> Given:
> 
>     yes Andy, original command is looking parentchild(195K) records in 
> Emp_attr(5000) and creating MAP_attr.csv(195K) records.
>     versus below command with out pipe is looking for EMP_attr.csv(5000) 
> against Parentchild(195K) and creating MAP_Attr.csv with 5000 records.
> 
> 
> Sounds to me like that they ran the command with the input files in the wrong
> order as the posted awk script will output the same number of lines as are
> present in the input file pass in the args list so it's impossible for the
> posted awk script to output some number of lines other than are present in
> ParentChild.csv unless it aborts mid-processing but then for it to output
> exactly the same number of lines as are present in Emp_Attr.csv in that
> scenario seems.... unlikely!
> 
>     Ed.
> 
> On 6/16/2021 7:19 AM, Andrew J. Schorr wrote:
> 
>     Hi,
> 
>     This makes no sense to me. The pure gawk version is simpler and cleaner 
> without
>     the pipe. Are you sure that you copied the commands properly? Do any 
> Windoze
>     folks have an idea of what could be going wrong here?
> 
>     Regards,
>     Andy
> 
>     On Wed, Jun 16, 2021 at 11:27:53AM +0000, Koleti, Haritha wrote:
> 
>         yes Andy, original command is looking parentchild(195K) records in 
> Emp_attr
>         (5000) and creating MAP_attr.csv(195K) records.
>         versus below command with out pipe is looking for EMP_attr.csv(5000) 
> against
>         Parentchild(195K) and creating MAP_Attr.csv with 5000 records.
> 
>         thank you!!
>         Haritha
> 
> 
>         -----Original Message-----
>         From: Andrew J. Schorr <aschorr@telemetry-investments.com>
>         Sent: Tuesday, June 15, 2021 2:14 PM
>         To: Koleti, Haritha <Haritha.Koleti@pseg.com>
>         Cc: Eli Zaretskii <eliz@gnu.org>; mortoneccc@comcast.net; 
> arnold@skeeve.com;
>         wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo
>         <Ricardo_D.Pereira@pseg.com>; Pirane, Marco <Marco.Pirane@pseg.com>
>         Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 
> ->from Win 2008
>         to Win 2016
> 
>         ***CAUTION******CAUTION******CAUTION***This e-mail is from an 
> EXTERNAL address.
>          The actual sender is  (aschorr@telemetry-investments.com) which may 
> be
>         different from the display address in the From: field. Be cautious of 
> clicking
>         on links or opening attachments. Suspicious? Report it via the Report 
> Phishing
>         button.  On mobile phones, forward message to Cyber Security.
> 
>         Hi,
> 
>         I'm not sure that I understand your message. Are you saying that you 
> are
>         getting different results from:
> 
>         TYPE  ParentChild.csv|gawk -f Emp_Attr.awk>Emp_Attr.csv TYPE  
> ParentChild.csv|
>         gawk -v f2=Emp_Attr.csv -f map_attr.awk>Map_Attr.csv
> 
>         versus:
> 
>         gawk -f Emp_Attr.awk ParentChild.csv>Emp_Attr.csv gawk -v 
> f2=Emp_Attr.csv -f
>         map_attr.awk ParentChild.csv>Map_Attr.csv
> 
>         ???
> 
>         Is the difference in Emp_Attr.csv or Map_Attr.csv or both?
>         Or am I confused about what you are indicating? These commands should 
> be
>         equivalent, and the latter versions should be faster, I would think. 
> If you
>         additionally use Ed's modified version of map_attr.awk, you should 
> get top
>         speed.
> 
>         Regards,
>         Andy
> 
>         On Tue, Jun 15, 2021 at 04:58:53PM +0000, Koleti, Haritha via Bug 
> reports and
>         all discussion about gawk. wrote:
> 
>             it runs faster but the final file is not as expected it is 192KB 
> where
> 
>         original file should have been 16230KB.
> 
>             we are not getting right output that we require.
> 
> 
> 
>             [https://www.pseg.com/images/global/email/
> 
>         PSEG_emailsignature_PSEGw-tag_version2.png]<http://www.pseg.com>
> 
>             [https://urldefense.com/v3/__http://facebook.com/pseg__;!!ITzsDw!
> 
>         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjLIevkbLg$ 
> [facebook
>         
> [.]com]]<https://urldefense.com/v3/__http://www.facebook.com/pseg__;!!ITzsDw!
>         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjJOb1Po8w$ 
> [facebook
>         [.]com]>        [Twitter] 
> <https://urldefense.com/v3/__http://www.twitter.com/
>         psegdelivers__;!!ITzsDw!
>         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjI9yjTfPw$ 
> [twitter[.]
>         com]>         [LinkedIn] 
> <https://urldefense.com/v3/__http://www.linkedin.com/
>         company/pseg__;!!ITzsDw!
>         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjJPqAX0Zg$ 
> [linkedin
>         [.]com]>       [https://www.pseg.com/images/global/WP_LOGOgrey.png] 
> <https://
>         urldefense.com/v3/__http://energizepseg.com/__;!!ITzsDw!
>         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjKCrSe70w$
>         [energizepseg[.]com]>
> 
> 
>             PSEGSC
>             -----Original Message-----
>             From: Eli Zaretskii <eliz@gnu.org>
>             Sent: Tuesday, June 15, 2021 11:33 AM
>             To: Koleti, Haritha <Haritha.Koleti@pseg.com>
>             Cc: mortoneccc@comcast.net; arnold@skeeve.com;
>             wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo
>             <Ricardo_D.Pereira@pseg.com>; Pirane, Marco 
> <Marco.Pirane@pseg.com>
>             Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 
> ->from
>             Win 2008 to Win 2016
> 
>             ***CAUTION******CAUTION******CAUTION***This e-mail is from an 
> EXTERNAL
> 
>         address.  The actual sender is  (eliz@gnu.org) which may be different 
> from the
>         display address in the From: field. Be cautious of clicking on links 
> or opening
>         attachments. Suspicious? Report it via the Report Phishing button.  
> On mobile
>         phones, forward message to Cyber Security.
> 
>                 From: "Koleti, Haritha" <Haritha.Koleti@pseg.com>
>                 CC: "wolfgang.laun@gmail.com" <wolfgang.laun@gmail.com>,
>                         "bug-gawk@gnu.org"
>                 <bug-gawk@gnu.org>,
>                         "Pereira, Ricardo" <Ricardo_D.Pereira@pseg.com>,
>                         "Pirane,
>                  Marco" <Marco.Pirane@pseg.com>
>                 Date: Tue, 15 Jun 2021 15:13:14 +0000
> 
>                 This worked like a charm <1 minute.  But we have  100s of 
> scripts .   if
> 
>         would really help if we can find a root
> 
>                 cause why this 10 minutes versus 90 minutes.
> 
>             Try what Andrew suggested: eliminate the TYPE command and the 
> pipe from the
> 
>         batch file.  Does that speed up the time, and if so, by how much?
> 
>             The information contained in this e-mail, including any 
> attachment(s), is
> 
>         intended solely for use by the named addressee(s). If you are not the 
> intended
>         recipient, or a person designated as responsible for delivering such 
> messages
>         to the intended recipient, you are not authorized to disclose, copy, 
> distribute
>         or retain this message, in whole or in part, without written 
> authorization from
>         PSEG. This e-mail may contain proprietary, confidential or privileged
>         information. If you have received this message in error, please 
> notify the
>         sender immediately. This notice is included in all e-mail messages 
> leaving
>         PSEG. Thank you for your cooperation.
>         The information contained in this e-mail, including any 
> attachment(s), is
>         intended solely for use by the named addressee(s). If you are not the 
> intended
>         recipient, or a person designated as responsible for delivering such 
> messages
>         to the intended recipient, you are not authorized to disclose, copy, 
> distribute
>         or retain this message, in whole or in part, without written 
> authorization from
>         PSEG. This e-mail may contain proprietary, confidential or privileged
>         information. If you have received this message in error, please 
> notify the
>         sender immediately. This notice is included in all e-mail messages 
> leaving
>         PSEG. Thank you for your cooperation.
> 
> 

-- 
Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C.      phone:  917-305-1748
152 W 36th St, #402                fax:    212-425-5550
New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]