[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is going in this syntax

From: Tim Van Holder
Subject: Re: What is going in this syntax
Date: Fri, 08 Oct 2004 08:13:28 +0200
User-agent: Mozilla Thunderbird 0.8 (Windows/20040913)

wim delvaux wrote:
Hi all,

this excerpt.

"unsigned"{WSPC}+{IDENT} ...;

{IDENT}                  ...;

Where defined are
WSPC                [ \t]
IDENT               {VAROF}|{REALIDENT}

when the input presents 'unsigned long' the first rule applies (which is what you would expect) but ONLY with the tokenvalue "unsigned" (which is NOT what you would expect).

next the long token is matched ... and NO not the second rule first but the first fires again.

I have run my syntax in debug to find out what was going on

Why is that

Look at what the rules expand to:

"unsigned"[ \t]+{VAROF}|{REALIDENT} ...;

{VAROF}|{REALIDENT}                 ...;

As you can see, the REALIDENT pattern is a valid match for both rules.
Because of this, 'unsigned' and 'long' will both be matched by the first
rule's REALIDENT pattern (the 'unsigned VARIANT' portion produces no
You'll need to add parentheses, either around the IDENT pattern (preferred) or around the use(s) of {IDENT}, to get the intended

As an aside, it may be better to use states for multi-word matching like
this; the above example won't handle a case where unsigned is on one
line and long is on the next, for example.  Setting a state upon
scanning 'unsigned' would allow you to cope with all sorts of things
and still know to handle 'long' specially.
As a further aside, I see no real reason to do any of this in the lexer
at all - combining 'unsigned' and the following type name is really a
job for the parser.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]