bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 t


From: Andrew J. Schorr
Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 2008 to Win 2016
Date: Wed, 16 Jun 2021 09:07:28 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

Please send the results from these commands then:

wc -l ParentChild.csv
gawk -f Emp_Attr.awk ParentChild.csv>Emp_Attr.csv
wc -l ParentChild.csv
gawk -v f2=Emp_Attr.csv -f map_attr.awk ParentChild.csv>Map_Attr.csv
wc -l ParentChild.csv Map_Attr.csv
TYPE map_attr.awk

I'm assuming that your environment has "wc" available in addition to gawk;
maybe that's a flawed assumption. If wc is not available, then you can 
use gawk instead, depending on the level of quoting insanity in your shell,
like so:

gawk 'END {print FILENAME, FNR}' ParentChild.csv
gawk 'END {print FILENAME, FNR}' Map_Attr.csv

Regards,
Andy

On Wed, Jun 16, 2021 at 12:54:52PM +0000, Koleti, Haritha wrote:
> Sent too fast same result.
> 
>  
> 
> From: Koleti, Haritha
> Sent: Wednesday, June 16, 2021 8:47 AM
> To: 'Andrew J. Schorr' <aschorr@telemetry-investments.com>; Ed Morton
> <mortoneccc@comcast.net>
> Cc: Pirane, Marco <Marco.Pirane@pseg.com>; bug-gawk@gnu.org; Pereira, Ricardo
> <Ricardo_D.Pereira@pseg.com>
> Subject: RE: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 
> 2008
> to Win 2016
> 
>  
> 
> [cid]
> 
>  
> 
>  
> 
> -----Original Message-----
> From: Andrew J. Schorr <aschorr@telemetry-investments.com>
> Sent: Wednesday, June 16, 2021 8:39 AM
> To: Ed Morton <mortoneccc@comcast.net>
> Cc: Koleti, Haritha <Haritha.Koleti@pseg.com>; Pirane, Marco
> <Marco.Pirane@pseg.com>; bug-gawk@gnu.org; Pereira, Ricardo
> <Ricardo_D.Pereira@pseg.com>
> Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->from Win 
> 2008
> to Win 2016
> 
>  
> 
> ***CAUTION******CAUTION******CAUTION***This e-mail is from an EXTERNAL
> address.  The actual sender is  (aschorr@telemetry-investments.com) which may
> be different from the display address in the From: field. Be cautious of
> clicking on links or opening attachments. Suspicious? Report it via the Report
> Phishing button.  On mobile phones, forward message to Cyber Security.
> 
>  
> 
> Hi Ed,
> 
>  
> 
> That sounds right to me. As you point out, map_attr.awk produces precisely one
> line of output for each line of input. So the command:
> 
>  
> 
> gawk -v f2=Emp_Attr.csv -f map_attr.awk ParentChild.csv>Map_Attr.csv
> 
>  
> 
> should produce a Map_Attr.csv file that has exactly the same number of records
> as the ParentChild.csv file. There must have been a cut & paste copy error.
> 
>  
> 
> Haritha -- can you please try again, taking care to make sure that the command
> is copied exactly as written above?
> 
>  
> 
> Regards,
> 
> Andy
> 
>  
> 
> On Wed, Jun 16, 2021 at 07:33:50AM -0500, Ed Morton wrote:
> 
> > Given:
> 
> >
> 
> >     yes Andy, original command is looking parentchild(195K) records in
> Emp_attr(5000) and creating MAP_attr.csv(195K) records.
> 
> >     versus below command with out pipe is looking for EMP_attr.csv(5000)
> against Parentchild(195K) and creating MAP_Attr.csv with 5000 records.
> 
> >
> 
> >
> 
> > Sounds to me like that they ran the command with the input files in
> 
> > the wrong order as the posted awk script will output the same number
> 
> > of lines as are present in the input file pass in the args list so
> 
> > it's impossible for the posted awk script to output some number of
> 
> > lines other than are present in ParentChild.csv unless it aborts
> 
> > mid-processing but then for it to output exactly the same number of
> 
> > lines as are present in Emp_Attr.csv in that scenario seems.... unlikely!
> 
> >
> 
> >     Ed.
> 
> >
> 
> > On 6/16/2021 7:19 AM, Andrew J. Schorr wrote:
> 
> >
> 
> >     Hi,
> 
> >
> 
> >     This makes no sense to me. The pure gawk version is simpler and cleaner
> without
> 
> >     the pipe. Are you sure that you copied the commands properly? Do any
> Windoze
> 
> >     folks have an idea of what could be going wrong here?
> 
> >
> 
> >     Regards,
> 
> >     Andy
> 
> >
> 
> >     On Wed, Jun 16, 2021 at 11:27:53AM +0000, Koleti, Haritha wrote:
> 
> >
> 
> >         yes Andy, original command is looking parentchild(195K) records in
> Emp_attr
> 
> >         (5000) and creating MAP_attr.csv(195K) records.
> 
> >         versus below command with out pipe is looking for EMP_attr.csv(5000)
> against
> 
> >         Parentchild(195K) and creating MAP_Attr.csv with 5000 records.
> 
> >
> 
> >         thank you!!
> 
> >         Haritha
> 
> >
> 
> >
> 
> >         -----Original Message-----
> 
> >         From: Andrew J. Schorr <aschorr@telemetry-investments.com>
> 
> >         Sent: Tuesday, June 15, 2021 2:14 PM
> 
> >         To: Koleti, Haritha <Haritha.Koleti@pseg.com>
> 
> >         Cc: Eli Zaretskii <eliz@gnu.org>; mortoneccc@comcast.net;
> arnold@skeeve.com;
> 
> >         wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo
> 
> >         <Ricardo_D.Pereira@pseg.com>; Pirane, Marco <Marco.Pirane@pseg.com>
> 
> >         Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6 ->
> from Win 2008
> 
> >         to Win 2016
> 
> >
> 
> >         ***CAUTION******CAUTION******CAUTION***This e-mail is from an
> EXTERNAL address.
> 
> >          The actual sender is  (aschorr@telemetry-investments.com) which may
> be
> 
> >         different from the display address in the From: field. Be cautious 
> > of
> clicking
> 
> >         on links or opening attachments. Suspicious? Report it via the 
> > Report
> Phishing
> 
> >         button.  On mobile phones, forward message to Cyber Security.
> 
> >
> 
> >         Hi,
> 
> >
> 
> >         I'm not sure that I understand your message. Are you saying that you
> are
> 
> >         getting different results from:
> 
> >
> 
> >         TYPE  ParentChild.csv|gawk -f Emp_Attr.awk>Emp_Attr.csv TYPE 
> ParentChild.csv|
> 
> >         gawk -v f2=Emp_Attr.csv -f map_attr.awk>Map_Attr.csv
> 
> >
> 
> >         versus:
> 
> >
> 
> >         gawk -f Emp_Attr.awk ParentChild.csv>Emp_Attr.csv gawk -v f2=
> Emp_Attr.csv -f
> 
> >         map_attr.awk ParentChild.csv>Map_Attr.csv
> 
> >
> 
> >         ???
> 
> >
> 
> >         Is the difference in Emp_Attr.csv or Map_Attr.csv or both?
> 
> >         Or am I confused about what you are indicating? These commands 
> > should
> be
> 
> >         equivalent, and the latter versions should be faster, I would think.
> If you
> 
> >         additionally use Ed's modified version of map_attr.awk, you should
> get top
> 
> >         speed.
> 
> >
> 
> >         Regards,
> 
> >         Andy
> 
> >
> 
> >         On Tue, Jun 15, 2021 at 04:58:53PM +0000, Koleti, Haritha via Bug
> reports and
> 
> >         all discussion about gawk. wrote:
> 
> >
> 
> >             it runs faster but the final file is not as expected it is
> 
> > 192KB where
> 
> >
> 
> >         original file should have been 16230KB.
> 
> >
> 
> >             we are not getting right output that we require.
> 
> >
> 
> >
> 
> >
> 
> >             [https://www.pseg.com/images/global/email/
> 
> >
> 
> >        
> 
> > PSEG_emailsignature_PSEGw-tag_version2.png]<http://www.pseg.com>
> 
> >
> 
> >             
> > [https://urldefense.com/v3/__http://facebook.com/pseg__;!!ITzsDw!
> 
> >
> 
> >         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjLIevkbLg$
> [facebook
> 
> >         [.]com]]<https://urldefense.com/v3/__http://www.facebook.com/
> pseg__;!!ITzsDw!
> 
> >         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjJOb1Po8w$
> [facebook
> 
> >         [.]com]>        [Twitter] <https://urldefense.com/v3/__http://
> www.twitter.com/
> 
> >         psegdelivers__;!!ITzsDw!
> 
> >         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjI9yjTfPw$
> [twitter[.]
> 
> >         com]>         [LinkedIn] <https://urldefense.com/v3/__http://
> www.linkedin.com/
> 
> >         company/pseg__;!!ITzsDw!
> 
> >         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjJPqAX0Zg$
> [linkedin
> 
> >         [.]com]>       [https://www.pseg.com/images/global/WP_LOGOgrey.png]
> <https://
> 
> >         urldefense.com/v3/__https://urldefense.com/v3/__http://
> energizepseg.com/__;!!ITzsDw!__;!!ITzsDw!
> 501U94eYRfYHigfF9-mQoZCQplgIh_un4JPbJLOn_iwwgjkZL-yHjVZVFNqBLcr7rg$
> [energizepseg[.]com]
> 
> >         822sQgC9LXZMAwCiYMZSwlyutaVquoyUSY4rouDADRSylfC9Vca7ScU4XjKCrSe70w$
> 
> >         [energizepseg[.]com]>
> 
> >
> 
> >
> 
> >             PSEGSC
> 
> >             -----Original Message-----
> 
> >             From: Eli Zaretskii <eliz@gnu.org>
> 
> >             Sent: Tuesday, June 15, 2021 11:33 AM
> 
> >             To: Koleti, Haritha <Haritha.Koleti@pseg.com>
> 
> >             Cc: mortoneccc@comcast.net; arnold@skeeve.com;
> 
> >             wolfgang.laun@gmail.com; bug-gawk@gnu.org; Pereira, Ricardo
> 
> >             <Ricardo_D.Pereira@pseg.com>; Pirane, Marco <
> Marco.Pirane@pseg.com>
> 
> >             Subject: Re: [EXTERNAL] Re: Performance issues using GAWK 3.1.6
> ->from
> 
> >             Win 2008 to Win 2016
> 
> >
> 
> >             ***CAUTION******CAUTION******CAUTION***This e-mail is from
> 
> > an EXTERNAL
> 
> >
> 
> >         address.  The actual sender is  (eliz@gnu.org) which may be 
> > different
> from the
> 
> >         display address in the From: field. Be cautious of clicking on links
> or opening
> 
> >         attachments. Suspicious? Report it via the Report Phishing button. 
> On mobile
> 
> >         phones, forward message to Cyber Security.
> 
> >
> 
> >                 From: "Koleti, Haritha" <Haritha.Koleti@pseg.com>
> 
> >                 CC: "wolfgang.laun@gmail.com" <wolfgang.laun@gmail.com>,
> 
> >                         "bug-gawk@gnu.org"
> 
> >                 <bug-gawk@gnu.org>,
> 
> >                         "Pereira, Ricardo" <Ricardo_D.Pereira@pseg.com>,
> 
> >                         "Pirane,
> 
> >                  Marco" <Marco.Pirane@pseg.com>
> 
> >                 Date: Tue, 15 Jun 2021 15:13:14 +0000
> 
> >
> 
> >                 This worked like a charm <1 minute.  But we have  100s of
> scripts .   if
> 
> >
> 
> >         would really help if we can find a root
> 
> >
> 
> >                 cause why this 10 minutes versus 90 minutes.
> 
> >
> 
> >             Try what Andrew suggested: eliminate the TYPE command and
> 
> > the pipe from the
> 
> >
> 
> >         batch file.  Does that speed up the time, and if so, by how much?
> 
> >
> 
> >             The information contained in this e-mail, including any
> 
> > attachment(s), is
> 
> >
> 
> >         intended solely for use by the named addressee(s). If you are not 
> > the
> intended
> 
> >         recipient, or a person designated as responsible for delivering such
> messages
> 
> >         to the intended recipient, you are not authorized to disclose, copy,
> distribute
> 
> >         or retain this message, in whole or in part, without written
> authorization from
> 
> >         PSEG. This e-mail may contain proprietary, confidential or 
> > privileged
> 
> >         information. If you have received this message in error, please
> notify the
> 
> >         sender immediately. This notice is included in all e-mail messages
> leaving
> 
> >         PSEG. Thank you for your cooperation.
> 
> >         The information contained in this e-mail, including any attachment
> (s), is
> 
> >         intended solely for use by the named addressee(s). If you are not 
> > the
> intended
> 
> >         recipient, or a person designated as responsible for delivering such
> messages
> 
> >         to the intended recipient, you are not authorized to disclose, copy,
> distribute
> 
> >         or retain this message, in whole or in part, without written
> authorization from
> 
> >         PSEG. This e-mail may contain proprietary, confidential or 
> > privileged
> 
> >         information. If you have received this message in error, please
> notify the
> 
> >         sender immediately. This notice is included in all e-mail messages
> leaving
> 
> >         PSEG. Thank you for your cooperation.
> 
> >
> 
> >
> 
>  
> 
> --
> 
> Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
> 
> Telemetry Investments, L.L.C.      phone:  917-305-1748
> 
> 152 W 36th St, #402                fax:    212-425-5550
> 
> New York, NY 10018-8765
> 
> The information contained in this e-mail, including any attachment(s), is
> intended solely for use by the named addressee(s). If you are not the intended
> recipient, or a person designated as responsible for delivering such messages
> to the intended recipient, you are not authorized to disclose, copy, 
> distribute
> or retain this message, in whole or in part, without written authorization 
> from
> PSEG. This e-mail may contain proprietary, confidential or privileged
> information. If you have received this message in error, please notify the
> sender immediately. This notice is included in all e-mail messages leaving
> PSEG. Thank you for your cooperation.



-- 
Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C.      phone:  917-305-1748
152 W 36th St, #402                fax:    212-425-5550
New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]