help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: common per-token info


From: Hans Aberg
Subject: Re: common per-token info
Date: Mon, 7 Jan 2002 22:51:22 +0100

At 11:08 -0800 2002/01/07, John W. Millaway wrote:
>Everyone: What are some example parsers/grammars that need per-token
>information not covered by the list [file,line,column, USER_DEFINED_MEMBERS].

The only missing part is the stream position number: The (line, column)
pair is used to indicate position to a human using an editor, but the
stream position number is used when indicating a position to another
computer program addressing the stream (usually a file) directly (it is
safer, as it does not depend on any interpretation of what a newline is).

Since the column number can be computed if one has the position number of
the beginning of the line, plus of the character itself, but that one
cannot find the position number from the line and column numbers, I think
one should settle for the former, and compute the column number at need.
(The column number will only be needed when issuing an actual error, so
does not make any difference such computations at that time.)

Also note that sometimes the file name is required, and that one cannot
extract that from the FILE or istream (ifstream) structures. Instead of
this stuff, OS's use file descriptors, from which this information can be
obtained, and which one instead may want the Flex scanner keep track of.

>Let's assume that we can implement optional file, line, and column tracking
>mechansims that are reasonably configurable (zero-based or one-based, ...

If one does not use these numbers during the actual parsing, but only at
error time, they need not be reconfigurable.

>What will
>we miss?

Possibly a pointer to the beginning of the line, if the buffer question can
be solved. This would be used when issuing an error for whitespace/tab
scanning, giving context info, etc.

> Is this too much information per-token?

If there is a performance hit, one should be able to deselect not needed
components.

> Do most parsers unwrap tokens,
>then rewrap them in user-defined objects at the parser level anyway
>(discarding
>the information gathered by the scanner)?

If there is back-tracking, one can try keeping track of potential branching
points, and at them record the line number and the char* of the beginning
of that line, if any of those are requested.

But it looks as though adding backtracking to the Flex scanner imposes such
a performance hit so that it then does not matter how the line number is
computed.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]