help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.


From: Scott A Crosby
Subject: Re: Scanner takes a long time to build.
Date: Sat, 27 Jul 2002 18:38:24 +0200

On Sat, 27 Jul 2002 13:06:25 +0200, Hans Aberg <address@hidden> writes:

> At 15:59 -0700 2002/07/26, Vern Paxson wrote:
>
> It is interesting though that the other fellow wanting the feature
> was writing on an anti-spam tool. So it could happen that these
> "natural language pattern recognition" tools require grammars that
> may blow up exponentially in a straightforward DFA translation.
>

I created none of those regexp's... I recently started to use
spamassassin, then noted the performance (.5-1.5 seconds/message). I
know that flex can do matching of hundreds of regex's in parallel at
megabytes a second, so I thought I'd test the feasibility and
performance of flex.

To test the feasibility of this, I created a perl script that would
take perl-style regexp's and transform them into flex-style rules. As
it was merely a test, I skipped any regexp that could not be
automatically transformed. That left about half of them (150)
transformed regexp's, many that did cause flex to blow up. (If there's
sufficient interest, I can attach the flex file; approx 4kb
compressed.)

Typically they were of the form: foo.{,10}bar

John Millaway pointed this out and suggested transforming them to
'foo[^b]{,10}bar', which I altered my script to do. This caused flex
to no longer blow up. However, that alters the semantics of what we
want the regexp to match sufficiently that it is undesirable.

The regexp's aren't mine, I was just testing how well flex could
handle them. As flex currently is, it appears not. (Not to say that
I've not used flex for bulk matching and simple text manipulation at
high speed. Thanks for helping to create it. Its a valuable tool even
when not writing a compiler.)


Scott





reply via email to

[Prev in Thread] Current Thread [Next in Thread]