help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FLEX <<EOF>> with yymore() token


From: Bill Fenlason
Subject: FLEX <<EOF>> with yymore() token
Date: Mon, 4 Jun 2001 11:06:52 -0400

I posted part of this question to comp.compilers, and John Millaway pointed
me here.  Thanks John.  I've read the archives but did not see this topic
discussed.

In FLEX, the current buffer is flushed immediately when EOF is encountered,
even if it contains a token pushed by yymore().  That means that something
like:
    <start_cond><<EOF>>{If (yyleng > 0) return(A_TOKEN) .... }
fails, because yyleng may be non-zero but yytext is null.  The token is
copied to the start of the buffer but is then overwritten by the buffer
flush (via yyrestart).

I modified the skeleton to check this out.  If the call to yyrestart is
bypassed (OK in my case), the problem partly goes away.  Is this a bug or an
unintended byproduct?

The core issue relates to <<EOF>> and what actions after <<EOF>> are
allowed.  <<EOF>> is logically a state rather than a token, and the null
return (after yywrap) makes perfect sense to me.  The comment in the code
about a repeated call returning null again also makes sense, but it seems to
me that allowing the return of a residual token (pushed by yymore) would be
appropriate.  I realize the difficulty in trying to allow <<EOF>> as right
context in a pattern, and I had hoped to accomplish the same thing via the
<<EOF>> rules.

Currently at <<EOF>> yyleng is set to 1 plus the yymore length, and I would
propose that it should be set to the yymore length only (usually 0).  The
scan has to rely on the trailing null in the buffer to identify the <<EOF>>
state, but should it be treated as an actual token?  (In the case above I
needed to use --yyleng.)

I understand the need to reset the buffer in case the user has changed yyin.

The man page specifies that repeated calls after EOF are undefined.  Would
defining them such that zero additional characters are matched and that null
is returned be an improvement?  Should the calculation of yyleng at <<EOF>>
be changed?  Should there be a change regarding the buffer flush to allow
the residual token to be returned?

My case involves recognizing identifiers which may contain extralingual
characters defined at runtime.

Thank you.

Bill Fenlason






reply via email to

[Prev in Thread] Current Thread [Next in Thread]