bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and


From: Andrew J. Schorr
Subject: Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and takes forever
Date: Thu, 13 Jul 2017 12:09:41 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

I grabbed the file. It's broken both in master and in stable.

gawk 4.1.3:

bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
472.19user 96.71system 6:59.57elapsed 135%CPU (0avgtext+0avgdata 
876968maxresident)k
0inputs+0outputs (0major+63559391minor)pagefaults 0swaps

Master branch (I ctrl-c'ed after 13 minutes):

bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
^C813.84user 17.63system 13:23.65elapsed 103%CPU (0avgtext+0avgdata 
1122292maxresident)k
0inputs+0outputs (0major+11885984minor)pagefaults 0swaps

Stable branch (I ctrl-c'ed after 13 minutes):

bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
^C828.07user 23.17system 13:35.58elapsed 104%CPU (0avgtext+0avgdata 
1590252maxresident)k
456inputs+72outputs (4major+15541814minor)pagefaults 0swaps

Kind of a pain to bisect, since each iteration will be so slow. I haven't
tried yet.

-Andy

On Thu, Jul 13, 2017 at 01:42:21AM -0600, address@hidden wrote:
> If neither of those are any better, then let's work offline to isolate
> when things broke. "git bisect" is quite good at that.  :-) If possible,
> I'd prefer to fix the problem instead of leaving things alone.
> 
> Thanks,
> 
> Arnold
> 
> address@hidden wrote:
> 
> > Hi.
> >
> > Can you try building from the gawk-4.1-stable branch in the git repo
> > and let me know if you still have the problem?
> >
> > I'm also curious if you build from master in the repo what happens.
> >
> > Thanks,
> >
> > Arnold
> >
> > Alexandre Oliva <address@hidden> wrote:
> >
> > > Hi,
> > >
> > > I've upgraded the root in which I create and verify GNU Linux-libre
> > > tarballs from Fedora/Freed-ora 25 to 26, which brought gawk from 4.1.3 to
> > > 4.1.4.
> > >
> > > With 4.1.3, it used about 1GB of RAM and took some 15 minutes to run.
> > >
> > > With 4.1.4, I gave up after 2 hours of CPU time, and the process was at
> > > 6GB and growing.
> > >
> > > I saw a number of regexp changes in gawk 4.1.3-4.1.4 diff, so I took the
> > > Fedora 25 binary and it's running on the Fedora 26 root with the
> > > previous memory use.
> > >
> > > The command I use to perform this check is:
> > >
> > > deblob-check --use-awk linux-libre-4.12.tar.bz2
> > >
> > > deblob-check and the tarball can be downloaded from
> > > http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/
> > >
> > > The script generates and runs a gawk script with monster regexps that
> > > match known blobs, known false positives, and patterns that catch likely
> > > blobs, and it's running that generated script that's taking up a lot of
> > > RAM and time.
> > >
> > > deblob-check can use sed, python or perl instead of gawk, but gawk used
> > > to be the best choice for this final checking, because of the low memory
> > > use compared with sed, and the DFA-based regexp not available in python
> > > and perl.  (for deblobbing proper, python turns out to be better due to
> > > the much lower start-up time compiling the monster regexp)
> > >
> > > I haven't checked whether gawk 4.1.4 still beats the memory efficiency
> > > of sed, but sed was barely usable for this purpose back then, and gawk
> > > 4.1.4 is unfortunately turning out to be unusable too.
> > >
> > > Any recommendations as to how we could avoid this huge performance
> > > regression in gawk, short of switching to a different regexp processing
> > > engine?
> > >
> > > Thanks in advance,
> > >
> > > -- 
> > > Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> > > You must be the change you wish to see in the world. -- Gandhi
> > > Be Free! -- http://FSFLA.org/   FSF Latin America board member
> > > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

-- 
Andrew Schorr                      e-mail: address@hidden
Telemetry Investments, L.L.C.      phone:  917-305-1748
545 Fifth Ave, Suite 1108          fax:    212-425-5550
New York, NY 10017-3630



reply via email to

[Prev in Thread] Current Thread [Next in Thread]