help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Scanner takes a long time to build.


From: Hans Aberg
Subject: Re: Scanner takes a long time to build.
Date: Sat, 27 Jul 2002 13:06:25 +0200

At 15:59 -0700 2002/07/26, Vern Paxson wrote:
>> Hans> But loc.cit. also says that one can compute the DFA transitions
>> Hans> on the fly, caching them. This will combine the NFA space
>> Hans> complexity with the DFA time complexity, it says.
>>
>Akim> In the case of Flex, which is a batch tool as opposed to Perl which
>Akim> aims at being ``interactive'', it doesn't make a lot of sense to
>Akim> continue the computation of the automaton at scanner runtime.
>
>It might make sense.

Perhaps Akim also misunderstood the nature of the algorithm somewhat:

The way I interpret it, the algorithm builds the DFA states as the parsing
goes along, caching them then, but it does not a priori require the input
grammar to be dynamic: The DFA state translation mechanism needs to be
incorporated, but the translation could be from say some NFA states one has
derived from the grammar.

So, as far as Flex goes, it would be like an options that generates a
different lexer, not anything that alters the input in any essential way.

One could also think of Flex generating NFA/DFA hybrids: I just read in
comp.compilers that one should use a NFA if one needs to keep track of
parsing position relative the grammar, like when the patterns has groupings
that should be extracted for substitution. For example, a pattern
  "begin"[.*]"end"  { yylval.text = \1; return GROUPING; }
where \1 would refer to the matched text between "begin" and "end" then.

So one way to do this more efficiently is to use a DFA in order to extract
a substring, and then apply a NFA onto that substring. In the example
above, one would use a DFA for finding "begin".*"end", and when it has been
found, one applies a NFA to "begin"[.*]"end" in order to find the string
between "begin" and "end". (Incidentally, this may relate to the problems
Akim had when trying to pinpoint locations: perhaps he merely did not apply
the NFA/DFA hybrid technique. :-) )

But it could happen that an integrated NFA/DFA lexer could do much better.

It also strikes me that Bison might want to have a similar feature for
implementing say a LR(1) grammar: Bison would then generate a LR(1)
non-deterministic parser states, and the parser would translate and cache
that to deterministic parser states. It should make the LR(1) footprint of
the parser smaller, as one then selects only the states typically needed.

>My main current project these days is Bro, a network intrusion detection
>system.

So it isn't "Big Bro" then. :-)

It is interesting though that the other fellow wanting the feature was
writing on an anti-spam tool. So it could happen that these "natural
language pattern recognition" tools require grammars that may blow up
exponentially in a straightforward DFA translation.

>  It's written in C++.

So then it will have to be passed to the GNU C++ to C rewriting team. :-)

This is no problem though for Flex, as Flex ain't GNU.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]