help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Laurence Finston
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Sat, 15 May 2004 23:54:42 +0200
User-agent: IMHO/0.98.3+G (Webmail for Roxen)

If you don't signal "begin plain text" and "end plain text" somehow,
e.g., with balanced single quotes, as in your example,
it would be impossible for you to have plain text that contains tokens that
would be valid in parser rules.

It would be possible to look up tokens in `yylex()' before returning them to
`yyparse()'.  If they don't appear in a list of valid tokens, then you could
output them verbatim until you reach one that does.  I use a similar technique
(for a different purpose) in my parser with a C++ `map'.  You could, of
course, do the same thing with C, with a bit more effort.  However, this alone
will only work as long as the "plain text"
doesn't contain any valid tokens, unless you signal that they are _not_ to be
interpreted as such. 

Is there a particular reason why you've chosen to use Bison for this purpose?
If the proportion of plain text to markup is high, I would consider writing a
macro processor (like TeX) to handle it.



-------------------
> Laurence Finston <address@hidden>:
> >
> > I think it should be possible to solve your problem in a way similar
> > to that described in the Bison manual for handling comments.  Since
> > the plain text presumably simply needs to be output verbatim there's
> > no reason for the parser to parse it. It can be processed in a rule
> > for the "plain text begin" token,
> 
> Sadly, there is no such token.
> 
> > in which you would call `yylex()' repeatedly, or you could have
> > `yylex()' process it, so that the parser never sees the plain text.
> 
> I don't see how this helps... The problem lies in knowing where the
> plain text ends, and for that I need to enlist Bison. Handling the
> text isn't a problem -- detecting the next non-text element is.
> 
> I see two main possibilities:
> 
>   1. For each position where plain text may occur, find out what
>      "non-plain-text" tokens may occur (and terminate the plain text)
>      and then use those to construct a non-ambiguous grammar (where
>      the plain text will be LL(1)).
> 
>   2. Use a plain text rule that will match anything, but let any legal
>      rules override it (and thus terminate it).
> 
> The first solution (which is the least satisfying) would require
> figuring out the following tokens (as I said, this has to be done
> automatically, as the grammar is user-supplied). I can either do this
> myself (hacky/messy) or try to get it from the LR table constructed by
> Bison. I don't see a straightforward way of doing this, but it should
> be possible.
> 
> The second solution is what I'm hoping for, but I don't know how to do
> it. If it is at all possible, it seems like I'll need to combine GLR
> parsing with (dynamic) priority somehow -- but I don't know how.
> 
> Just an example to show why a fixed set of markup tokens (that will
> end the plain text) won't do:
> 
>   Foo *bar 'baz *fee* fie' foe*.
> 
> Let's say the single quote represents verbatim text (code). Then the
> plain text between the two single quotes will contain two asterisks
> that are *not* to be interpreted as markup-tokens, while the asterisks
> outside should be interpreted as markup for emphasis.
> 
> This would be deducible from the grammar, because the code production
> rule wouldn't allow emphasis inside it. The lexer couldn't possibly
> know about this -- it would have to be handled as part of the parsing.
> 
> Thanks for your help, anyway :)
> 
> > Laurence
> 
> -- 
> Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
> http://hetland.org              "Shut up!"  - Linkin Park
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]