help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.


From: Hans Aberg
Subject: Re: Scanner takes a long time to build.
Date: Tue, 30 Jul 2002 18:43:47 +0200

At 17:27 +0200 2002/07/30, Akim Demaille wrote:
>>>>>> "Vern" == Vern Paxson <address@hidden> writes:
>Vern> My main current project these days is Bro, a network intrusion
>Vern> detection system.  It's written in C++.  For its regular
>Vern> expression matching, I took flex's generator and turned it into
>Vern> a set of classes that can be called at run-time.
...
>Vern> At first, I ran it like flex does - build the entire DFA when
>Vern> Bro starts up (and I added an on-disk state cache to speed this
>Vern> up for subsequent runs).  But this ran into problems when I
>Vern> wanted very large patterns.  So I changed it to build the
>Vern> matcher incrementally (which actually was an easy change at this
>Vern> point).  The real surprise was that this was *faster* than
>Vern> running from the precompiled DFA loaded out of the disk cache!
...
>Vern> My best guess is that this is due to better memory locality -
>Vern> and the increase was slight - but way better than suffering
>Vern> degradation, which is what I had expected.
>
>Yes, locality is certainly the main culprit, and disk access too.
>Have a look at valgrind: it is a precious tool to find cache misses
>and so forth.  It helps understanding how bad huge tables are.
>
>It also reminds me of some GCC people saying that precompiled headers
>are not always a win, because disk access are so slow, that it is
>commonly faster to read the headers, parse them, take the proper
>actions, rather that loading a file that snapshots the result of the
>whole sequence!

This may be a factor in forwarding more dynamic approaches: For example,
like compiling into smaller tables for a non-deterministic
(finite/pushdown) automaton, and then add code to the parser that
translates it into a DFA as needed.

Incidentally, for Unicode, it may mean that the code is not necessarily
going to be slower, because the code-set is to large for use with
non-compacted tables anyway.

My guess is that this phenomenon has in part to do with the architecture of
todays computers, where the CPU has a large number of registers, running at
a speed several times higher than memory access: Under such circumstances,
it is going to be faster to load small segments of active code into the CPU
than doing several static array memory accesses.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]