help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Magnus Lie Hetland
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Thu, 20 May 2004 19:08:41 +0200
User-agent: Mutt/1.4i

Laurence Finston <address@hidden>:
>
[snip]
> I don't like to wait, whether it's linear, logarithmic, or
> exponential time.  Documents can easily have hundreds of thousands
> of characters.  If I have to match strings in the input against 100
> "suspicious" tokens this will take time.  In addition, since there
> are no hardwired criteria for determining the beginning and end of
> an item of input, it seems quite likely that every character in the
> input will have to be tested against the list of "suspicious"
> tokens, whether individually, or as the character in the nth
> position of a string.

I don't understand your point here; if the lexer has created a state
machine for the tokens (as in flex) you only have to look at each
character once -- not once for each "possible" token. This is no
different from ordinar tokenization.

This is, in fact, one of the reasons why my current implementation is
slow, I think, because I'm using the simplistic implementation of
searching for the various tokens independently.

> It might be possible to reduce the number of comparisons by sorting
> the "suspicious" tokens into an appropriate data structure, but this
> has associated costs, too, including the necessity of writing code
> to traverse the data structure.

This is what is done in Flex anyway -- and it's lightning fast.
Really.

[snip]
> I don't think all of this work is justified, since in most cases the
> result will be "it's not markup, output it verbatim."  

Well, if you've got a suggestion for another way of doing it, I'm
dying to hear it :]

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park




reply via email to

[Prev in Thread] Current Thread [Next in Thread]