help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.


From: Vern Paxson
Subject: Re: Scanner takes a long time to build.
Date: Fri, 26 Jul 2002 15:59:51 -0700

> Hans> But loc.cit. also says that one can compute the DFA transitions
> Hans> on the fly, caching them. This will combine the NFA space
> Hans> complexity with the DFA time complexity, it says.
> 
> In the case of Flex, which is a batch tool as opposed to Perl which
> aims at being ``interactive'', it doesn't make a lot of sense to
> continue the computation of the automaton at scanner runtime.

It might make sense.

My main current project these days is Bro, a network intrusion detection
system.  It's written in C++.  For its regular expression matching, I took
flex's generator and turned it into a set of classes that can be called
at run-time.  At first, I ran it like flex does - build the entire DFA when
Bro starts up (and I added an on-disk state cache to speed this up for
subsequent runs).  But this ran into problems when I wanted very large
patterns.  So I changed it to build the matcher incrementally (which actually
was an easy change at this point).  The real surprise was that this was
*faster* than running from the precompiled DFA loaded out of the disk
cache!  My best guess is that this is due to better memory locality - and
the increase was slight - but way better than suffering degradation, which
is what I had expected.

A student I'm working with (Robin Sommer) has also added deleting of old
DFA states, so we can control the total memory used by the scanner.  We
now use it for matching hundreds of concurrent .*xxx.* patterns.

Of course I'm *not* volunteering to backport this to flex (and it doesn't
support flex's extra functionality, just the regular expression matching).
But it might be worth checking out.

The most recent Bro snapshot is available from:

        ftp://ftp.ee.lbl.gov/.vp-bro-pub-0.7a110.tar.gz

Look for the files:

        CCL.{h,cc}
        DFA.{h,cc}
        EquivClass.{h,cc}
        NFA.{h,cc}
        RE.{h,cc}
        re-parse.y
        re-scan.l

This doesn't have Robin's state management stuff, but it has the rest.

                Vern



reply via email to

[Prev in Thread] Current Thread [Next in Thread]