help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.


From: Akim Demaille
Subject: Re: Scanner takes a long time to build.
Date: 30 Jul 2002 17:27:44 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter)

>>>>> "Vern" == Vern Paxson <address@hidden> writes:

Akim> In the case of Flex, which is a batch tool as opposed to Perl
Akim> which aims at being ``interactive'', it doesn't make a lot of
Akim> sense to continue the computation of the automaton at scanner
Akim> runtime.

Vern> It might make sense.

Vern> My main current project these days is Bro, a network intrusion
Vern> detection system.  It's written in C++.  For its regular
Vern> expression matching, I took flex's generator and turned it into
Vern> a set of classes that can be called at run-time.  

For my education, I'd like to know why you didn't consider dynamic
regex compilers, such as Henry Spencer's, or pcre etc.

Vern> At first, I ran it like flex does - build the entire DFA when
Vern> Bro starts up (and I added an on-disk state cache to speed this
Vern> up for subsequent runs).  But this ran into problems when I
Vern> wanted very large patterns.  So I changed it to build the
Vern> matcher incrementally (which actually was an easy change at this
Vern> point).  The real surprise was that this was *faster* than
Vern> running from the precompiled DFA loaded out of the disk cache!

I'm not sure I'm reading you correctly: are you saying that building
NFA + making it DFA where it matters + running it < running a
precompiled one?

Vern> My best guess is that this is due to better memory locality -
Vern> and the increase was slight - but way better than suffering
Vern> degradation, which is what I had expected.

Yes, locality is certainly the main culprit, and disk access too.
Have a look at valgrind: it is a precious tool to find cache misses
and so forth.  It helps understanding how bad huge tables are.

It also reminds me of some GCC people saying that precompiled headers
are not always a win, because disk access are so slow, that it is
commonly faster to read the headers, parse them, take the proper
actions, rather that loading a file that snapshots the result of the
whole sequence!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]