help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is it always possible to make a non-reentrant parser reentrant?


From: Simon Richter
Subject: Re: Is it always possible to make a non-reentrant parser reentrant?
Date: Fri, 8 Feb 2019 15:47:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hi,

On 08.02.19 02:01, Peng Yu wrote:

> It seems to me that the parsing code could be made simpler by making
> the parser reentrant. So there can be a parser parses anything not
> heredoc and another parser just parse heredoc. And there should
> different lexers for non-heredoc and heredoc. Is it so?

The difficulty with lexers is that they keep their own buffer state, so
switching between lexers mid-stream is non-trivial.

Normally, you'd use lexer states to activate/deactivate rules. The
primitive approach would be

%x INITIAL HEREDOC

and then prefixing all matches with <INITIAL> or <HEREDOC>.

The main problem there is that state changes need to be driven by the
lexer code, as the BEGIN macro is only available there, so a change from
the parser would have to be communicated through yyextra, and applied in
the lexer code before matching a token (so YY_USER_ACTION is too late).

The other thing is that parsing heredocs with the lexer is rather
pointless, as the only thing we are interested in is dynamic anyway, so
grabbing the data out of the lexer stream with a custom function is
probably the better approach.

Some people use tar files as heredocs, so a "[a-zA-Z]*" rule can match
really long strings there, which the lexer would have to extend its
buffer for in order to provide yytext/yyleng. We can't limit the match
length either because then we'd have to jump through a lot of hoops to
match the end tag if it is straddled across two matches

If the lexer can identify heredocs reliably, then it's probably best to
let it provide a token HEREDOC to the parser after setting up the state
for heredoc parsing (which may live in yyextra to make it reentrant, but
that's orthogonal), and the parser then calls a special function to
retrieve the heredoc from the lexer's stream. That function would live
in the lexer source file so it can request more characters from the stream.

Another option I could see would be to have the lexer return fragments
of the heredoc, and just repeat the token as long as there is data —
this would also avoid having to read the entire stream into memory, and
keep the interface between lexer and parser down to yylex().

   Simon

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]