help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locations suggest -- we're stupid


From: Hans Aberg
Subject: Re: Locations suggest -- we're stupid
Date: Mon, 7 Jan 2002 22:08:55 +0100

At 20:49 +0200 2002/01/06, Nikos Balkanas wrote:
>> So maybe the important question should be: *why* is %%yylineno (claimed
>> to be by flex -p) such a performance penalty?
>
>Because:
>
>char *p = yytext;
>
>while (*p) if ((*p)++ == '\n') yylineno++;
>
>The flex yylineno option is doing something like that (I don't know the
>exact code).
>Basically it scans the same buffer once more just to find the newlines. A
>streamlined parser will scan input only once. Flex cannot do anything better
>with a general grammar.

The code segment is:

                YY_DO_BEFORE_ACTION;

                if ( yy_act != YY_END_OF_BUFFER )
                        {
                        int yyl;
                        for ( yyl = 0; yyl < yyleng; ++yyl )
                                if ( yytext[yyl] == '\n' )
                                        ++yylineno;
                        }

This cannot explain a heavy performance loss, as it knocks onto at most a
constant factor to the time complexity, same as of the lexer itself, and
that this factor is probably small, given the fact that the other parsing
stuff with jumps probably take a lot longer time. If computers double in
speed every year, that cannot be overly important in most uses.

It looks as though it is possible to insert the stuff into the parsing loop
at least when the parser is deterministic, to give some optimization.
Perhaps it should be:

yy_match:
  do {
    register YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_cp)];
    ...
    yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int)yy_c];
    ++yy_cp;
    if (*yy_cp == '\n')
      ++yylineno;
  } while (yy_base[yy_current_state] != 101);

or:

yy_match:
  do {
    char yy_c0 = *yy_cp;
    if (*yy_c0 == '\n')
      ++yylineno;
    YY_CHAR yy_c = yy_ec[YY_SC_TO_UI(*yy_c0)];
    ...

If one does not like the conditional, which may break the CPU's instruction
pipe, one might try assembler if there is an instruction that can give 1 or
0 depending if an integer is non-zero.

The next step would be to introduce some static analysis of the grammar (as
some humans do by hand). But the yy_match loop already contains
conditionals, a subloop, and several other computations. So it would be
nice with some profiling with the
  if ( do_yylineno )
    /* This should really be "maintain_backup_tables = true" */
    reject_really_used = true;
code disabled, and then with Flex recompiled comparing %option yylineno on/off.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]