bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and


From: Andrew J. Schorr
Subject: Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and takes forever
Date: Thu, 13 Jul 2017 14:20:28 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

His message included this URL:

   http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/

I grabbed

   http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/deblob-check

and

   
http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/linux-libre-4.12-gnu.tar.bz2

I started to bisect. In the master branch, I think commit 3a15491 is good.
I have 436 revisions left...

-Andy

On Thu, Jul 13, 2017 at 11:49:51AM -0600, address@hidden wrote:
> Where can I get the files from?
> 
> Thanks,
> 
> Arnold
> 
> "Andrew J. Schorr" <address@hidden> wrote:
> 
> > Hi,
> >
> > I grabbed the file. It's broken both in master and in stable.
> >
> > gawk 4.1.3:
> >
> > bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> > 472.19user 96.71system 6:59.57elapsed 135%CPU (0avgtext+0avgdata 
> > 876968maxresident)k
> > 0inputs+0outputs (0major+63559391minor)pagefaults 0swaps
> >
> > Master branch (I ctrl-c'ed after 13 minutes):
> >
> > bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> > ^C813.84user 17.63system 13:23.65elapsed 103%CPU (0avgtext+0avgdata 
> > 1122292maxresident)k
> > 0inputs+0outputs (0major+11885984minor)pagefaults 0swaps
> >
> > Stable branch (I ctrl-c'ed after 13 minutes):
> >
> > bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> > ^C828.07user 23.17system 13:35.58elapsed 104%CPU (0avgtext+0avgdata 
> > 1590252maxresident)k
> > 456inputs+72outputs (4major+15541814minor)pagefaults 0swaps
> >
> > Kind of a pain to bisect, since each iteration will be so slow. I haven't
> > tried yet.
> >
> > -Andy
> >
> > On Thu, Jul 13, 2017 at 01:42:21AM -0600, address@hidden wrote:
> > > If neither of those are any better, then let's work offline to isolate
> > > when things broke. "git bisect" is quite good at that.  :-) If possible,
> > > I'd prefer to fix the problem instead of leaving things alone.
> > > 
> > > Thanks,
> > > 
> > > Arnold
> > > 
> > > address@hidden wrote:
> > > 
> > > > Hi.
> > > >
> > > > Can you try building from the gawk-4.1-stable branch in the git repo
> > > > and let me know if you still have the problem?
> > > >
> > > > I'm also curious if you build from master in the repo what happens.
> > > >
> > > > Thanks,
> > > >
> > > > Arnold
> > > >
> > > > Alexandre Oliva <address@hidden> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've upgraded the root in which I create and verify GNU Linux-libre
> > > > > tarballs from Fedora/Freed-ora 25 to 26, which brought gawk from 
> > > > > 4.1.3 to
> > > > > 4.1.4.
> > > > >
> > > > > With 4.1.3, it used about 1GB of RAM and took some 15 minutes to run.
> > > > >
> > > > > With 4.1.4, I gave up after 2 hours of CPU time, and the process was 
> > > > > at
> > > > > 6GB and growing.
> > > > >
> > > > > I saw a number of regexp changes in gawk 4.1.3-4.1.4 diff, so I took 
> > > > > the
> > > > > Fedora 25 binary and it's running on the Fedora 26 root with the
> > > > > previous memory use.
> > > > >
> > > > > The command I use to perform this check is:
> > > > >
> > > > > deblob-check --use-awk linux-libre-4.12.tar.bz2
> > > > >
> > > > > deblob-check and the tarball can be downloaded from
> > > > > http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/
> > > > >
> > > > > The script generates and runs a gawk script with monster regexps that
> > > > > match known blobs, known false positives, and patterns that catch 
> > > > > likely
> > > > > blobs, and it's running that generated script that's taking up a lot 
> > > > > of
> > > > > RAM and time.
> > > > >
> > > > > deblob-check can use sed, python or perl instead of gawk, but gawk 
> > > > > used
> > > > > to be the best choice for this final checking, because of the low 
> > > > > memory
> > > > > use compared with sed, and the DFA-based regexp not available in 
> > > > > python
> > > > > and perl.  (for deblobbing proper, python turns out to be better due 
> > > > > to
> > > > > the much lower start-up time compiling the monster regexp)
> > > > >
> > > > > I haven't checked whether gawk 4.1.4 still beats the memory efficiency
> > > > > of sed, but sed was barely usable for this purpose back then, and gawk
> > > > > 4.1.4 is unfortunately turning out to be unusable too.
> > > > >
> > > > > Any recommendations as to how we could avoid this huge performance
> > > > > regression in gawk, short of switching to a different regexp 
> > > > > processing
> > > > > engine?
> > > > >
> > > > > Thanks in advance,
> > > > >
> > > > > -- 
> > > > > Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> > > > > You must be the change you wish to see in the world. -- Gandhi
> > > > > Be Free! -- http://FSFLA.org/   FSF Latin America board member
> > > > > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
> >
> > -- 
> > Andrew Schorr                      e-mail: address@hidden
> > Telemetry Investments, L.L.C.      phone:  917-305-1748
> > 545 Fifth Ave, Suite 1108          fax:    212-425-5550
> > New York, NY 10017-3630
> 

-- 
Andrew Schorr                      e-mail: address@hidden
Telemetry Investments, L.L.C.      phone:  917-305-1748
545 Fifth Ave, Suite 1108          fax:    212-425-5550
New York, NY 10017-3630



reply via email to

[Prev in Thread] Current Thread [Next in Thread]