help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Nested comments


From: Akim Demaille
Subject: Re: Nested comments
Date: Thu, 03 Jul 2003 09:48:07 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3 (gnu/linux)

 > I implemented nested [* ... *] comments by the code below, but it turns out
 > to be slow with many nestings, and I do not know why. Is there a way to
 > make it faster?

 > %{

 > int comment_level = 0;

 > %}

 > %x comment

 > %%

 > "[*"  { BEGIN(comment); comment_level = 1; }
 > <comment>{
 >   [^*[]+        {} /* Eat all but * [               */
 >   "*"+[^*[\]]*  {} /* Eat all * plus all but * [ ]  */
 >   "["+[^*[\]]*  {} /* Eat all [ plus all but * [ ]  */
 >   "["+"*"  { ++comment_level; }
 >   "*"+"]"  {
 >     --comment_level;
 >     if (comment_level == 0) {
 >       BEGIN(INITIAL);
 >     }
 >   }
 >   <<EOF>> {
 >     std::cerr << "Error: Nested comments not properly closed at end of
 > file.\n";
 >     BEGIN(INITIAL); return YYERRCODE;
 >   }
 > }
 > "*]"  { std::cerr << "Error: Too many comment closings *].\n";
 >         BEGIN(INITIAL); return YYERRCODE; }

I personally do the following:

----------------------------------------

"/*"        comment_level++; BEGIN STATE_COMMENT;

<STATE_COMMENT>{ /* Comments. */
  "*/" { /* End of the comment. */
    if (--comment_level == 0)
      {
        yylloc->step ();
        BEGIN INITIAL;
      }
  }

  "/*"          comment_level++;
  [^*/\n\r]+    ;
  {eol}+        yylloc->lines (yyleng);
  .             /* Stray `*' and `/'. */;

  <<EOF>> {
    std::cerr
      << *yylloc << ": unexpected end of file in a comment" << std::endl;
    exit_set (exit_scan);
    BEGIN INITIAL;
  }
}

----------------------------------------

The difference I can see, besides your better efforts at matching long
non-\n tokens, is that I don't match \n in the main RE, since that
could make huge tokens, which is certainly bad for performances,
especially if you do have long comments (Flex has to allocate, cache
penalty blah blah blah).





I wonder whether Flex could use the start conditions to isolate truly
distinct scanners/FSMs (instead of a single big FSM).  That would make
it possible for sublanguages, such as comments or strings, that have a
truly different nature, to use different FSM.  I suppose that in such
as case, one would have much smaller FSMs, and therefore probably
better performances (better locality, better for the cache etc.).  Of
course this requires reentrancy, but that's already provided by
today's Flexes.

This segmentation, this "making smaller functions" is also what makes
Flex scanners so slow compared to hand written scanners so...

Bah, I don't know, just a $0.02 thought.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]