Re: Scanner takes a long time to build.

help-flex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.

From:	Hans Aberg
Subject:	Re: Scanner takes a long time to build.
Date:	Tue, 30 Jul 2002 18:43:47 +0200

At 17:27 +0200 2002/07/30, Akim Demaille wrote:
>>>>>> "Vern" == Vern Paxson <address@hidden> writes:
>Vern> My main current project these days is Bro, a network intrusion
>Vern> detection system.  It's written in C++.  For its regular
>Vern> expression matching, I took flex's generator and turned it into
>Vern> a set of classes that can be called at run-time.
...
>Vern> At first, I ran it like flex does - build the entire DFA when
>Vern> Bro starts up (and I added an on-disk state cache to speed this
>Vern> up for subsequent runs).  But this ran into problems when I
>Vern> wanted very large patterns.  So I changed it to build the
>Vern> matcher incrementally (which actually was an easy change at this
>Vern> point).  The real surprise was that this was *faster* than
>Vern> running from the precompiled DFA loaded out of the disk cache!
...
>Vern> My best guess is that this is due to better memory locality -
>Vern> and the increase was slight - but way better than suffering
>Vern> degradation, which is what I had expected.
>
>Yes, locality is certainly the main culprit, and disk access too.
>Have a look at valgrind: it is a precious tool to find cache misses
>and so forth.  It helps understanding how bad huge tables are.
>
>It also reminds me of some GCC people saying that precompiled headers
>are not always a win, because disk access are so slow, that it is
>commonly faster to read the headers, parse them, take the proper
>actions, rather that loading a file that snapshots the result of the
>whole sequence!

This may be a factor in forwarding more dynamic approaches: For example,
like compiling into smaller tables for a non-deterministic
(finite/pushdown) automaton, and then add code to the parser that
translates it into a DFA as needed.

Incidentally, for Unicode, it may mean that the code is not necessarily
going to be slower, because the code-set is to large for use with
non-compacted tables anyway.

My guess is that this phenomenon has in part to do with the architecture of
todays computers, where the CPU has a large number of registers, running at
a speed several times higher than memory access: Under such circumstances,
it is going to be faster to load small segments of active code into the CPU
than doing several static array memory accesses.

  Hans Aberg

[Prev in Thread]

Current Thread

[Next in Thread]

Scanner takes a long time to build., Scott A Crosby, 2002/07/25
- Re: Scanner takes a long time to build., John Millaway, 2002/07/25
  - Re: Scanner takes a long time to build., Scott A Crosby, 2002/07/25
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/26
    - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/26
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/26
- Re: Scanner takes a long time to build., Vern Paxson, 2002/07/26
  - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/27
    - Message not available
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/27
  - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/30
    - Re: Scanner takes a long time to build., Hans Aberg <=
  - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/30
- Re: Scanner takes a long time to build., Scott A Crosby, 2002/07/27
- Re: Scanner takes a long time to build., Vern Paxson, 2002/07/27
  - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/28
- Re: Scanner takes a long time to build., Vern Paxson, 2002/07/30
  - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/31

Prev by Date: Re: Location revisited, column tracking...
Next by Date: Re: Scanner takes a long time to build.
Previous by thread: Re: Scanner takes a long time to build.
Next by thread: Re: Scanner takes a long time to build.
Index(es):
- Date
- Thread