help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locations suggest -- we're stupid


From: Nikos Balkanas
Subject: Re: Locations suggest -- we're stupid
Date: Sun, 6 Jan 2002 20:49:03 +0200

----- Original Message -----
From: Hans-Bernhard Broeker <address@hidden>
To: Hans Aberg <address@hidden>
Cc: <address@hidden>
Sent: Friday, January 04, 2002 2:26 PM
Subject: Re: Locations suggest -- we're stupid


> On Fri, 4 Jan 2002, Hans Aberg wrote:
>
> > I have used both variations, and keeping track of newlines while
designing
> > the lexer is a bother, especially when the grammar is not known
beforehand.
>
> Actually, it can be almost a nightmare. I've been through this with the
> 'cscope' scanner (parsing C with only a lexer), and it's been quite a
> tricky business to make sure all the linenumbers are right, given that C
> allows a \n just about anywhere a blank would be allowed, too.  I'm far
> from sure there aren't any big loopholes left, after months of fiddling
> with it.

I am sorry to hear that. I have used both variations, too. I find it a
breeze (and prefer) to track lineno myself
Consider a piece of code for your C parser:

space        [ \t]
sp             {space}+
ws            {space}*

%%

<IF>{
if{sp}\(                    {
                                    do_if();
                                     yy_push_state(PREDICATE);
                                }
if{ws}\n{ws}(          {
                                    lineno++;
                                    do_if();
                                    yy_push_state(PREDICATE);
                                }
}

> > If further Flex is slow in doing that, then I think one should try to
> > figure out why, and then successively remove that.
>
> Most definitely, yes.
>
> So maybe the important question should be: *why* is %%yylineno (claimed
> to be by flex -p) such a performance penalty?

Because:

char *p = yytext;

while (*p) if ((*p)++ == '\n') yylineno++;

The flex yylineno option is doing something like that (I don't know the
exact code).
Basically it scans the same buffer once more just to find the newlines. A
streamlined parser will scan input only once. Flex cannot do anything better
with a general grammar. Although I consider it poor style, I prefer to
provide my own function, scan_lines(), because I use it sparingly, only when
I absolutely need it. Most of my states look like this:

<STATE1>{
RIGHT_TEXT       { do_something(); }
WRONG_TEXT

                                   fprintf(stderr, "Error: Illegal text
\"%s\" in <STATE1> (%s: %d)\n", yytext, filename, lineno);
                                   exit(1);
                               }
{sp}                       {}
\n                            { lineno++; }
}

For which I don't need to rescan the buffer.

Nikos





reply via email to

[Prev in Thread] Current Thread [Next in Thread]