Re: Scanner takes a long time to build.

help-flex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.

From:	Scott A Crosby
Subject:	Re: Scanner takes a long time to build.
Date:	Thu, 25 Jul 2002 17:50:41 -0400 (EDT)

On Thu, 25 Jul 2002, John Millaway wrote:

> > Paragraph\ .a.{0,10}2.{0,10}C.\ of\ S.\ 1618
>
> That last regex is enough to blow up the DFA. It's not the number of regular
> expressions killing your flex file, it's the patterns. Have a look:
>

I suspected it was the non-determinism and backing up of '.', but I'd
forgotten that you could simplify the DFA's immensily by doing
    [^X].{,10}X###
so that it doesn't have to worry about non-determinism of which 'X' in
(say 'XXyyXXyyXX') indicates its the time to go on to try to match the ###
part of the regexp.

Thanks for reminding me of the trick.

> Paragraph\ .a[^2\n]{0,10}2[^C\n]{0,10}C.\ of\ S.\ 1618  ==> [89]
>
> I replaced the first . with [^2\n], and the second . with [^C\n]. This may not
> be what you want, but it shows the drastic reduction in DFA size. You'll have
> to rethink your regex's. Refer to the O'Reilly book , "Mastering Regular
> Expressions". This subject is addressed.

I was prototyping how well flex could run the rules in Spamassassin. So, I
wrote a quick script to rewrite the regexp's in its rulebase into
something flex could handle. (skipping the regexp's that couldn't be
easily mechanically transformed.)

I've updated the my program to now include your suggestion. It works
*MUCH* better now, but this change to the regexp's alters the semantics
too much. (for example:  'copyright[^a]{0,100}all\ rights\ reserved'
means much different than 'copyright.{0,100}all\ rights\ reserved')

Thanks for your help, though I've concluded that its not practical to use
flex to speed up spamassassin.

Scott

[Prev in Thread]

Current Thread

[Next in Thread]

Scanner takes a long time to build., Scott A Crosby, 2002/07/25
- Re: Scanner takes a long time to build., John Millaway, 2002/07/25
  - Re: Scanner takes a long time to build., Scott A Crosby <=
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/26
    - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/26
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/26
- Re: Scanner takes a long time to build., Vern Paxson, 2002/07/26
  - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/27
    - Message not available
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/27
  - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/30
    - Re: Scanner takes a long time to build., Hans Aberg, 2002/07/30
  - Re: Scanner takes a long time to build., Akim Demaille, 2002/07/30
- Re: Scanner takes a long time to build., Scott A Crosby, 2002/07/27

Prev by Date: Re: Scanner takes a long time to build.
Next by Date: Re: Scanner takes a long time to build.
Previous by thread: Re: Scanner takes a long time to build.
Next by thread: Re: Scanner takes a long time to build.
Index(es):
- Date
- Thread