help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locations suggest


From: Hans Aberg
Subject: Re: Locations suggest
Date: Mon, 7 Jan 2002 13:02:52 +0100

At 11:54 +0100 2002/01/07, Akim Demaille wrote:
>I really don't think so, there is way too much dependency wrt the
>underlying system, and the choices of the user.
>
>For instance: do col starts at 0 or 1?  I'm no kidding, Emacs has
>column 0 but line 1.

This is really a non-issue, as the column and other numbers are only used
at the time of the error message, at which time it is easy to recompute.

>When you read `foo\n\r' what is the location of \r?  On the next line,
>or it is part of the current?

As C only has \n as newline, the \r is the first character of the next line.

If Flex adds a feature that (as in Java) any of \n, \r, \r\n can be read as
a newline, the situation becomes different. Then in the input must be read
a binary.

One needs only to know what a newline is, and what comes immediately after
it is then a part of the next line.

>What do you call a space?  This is critical: the scanner skips the
>spaces, of course, which also means that there locations are skipped.
>So you must know what the user considers a space.  Likewise with
>comments, which can even be implemented as start conditions!

As for the computation of the column number on the line, the only thing
that matters is what a newline is.

>Some people will tell you they need (c0, l0) + wdith, not (c0, l0) +
>(c1, l1).  And you cannot convert from one to the other without having
>the text handy.

I am not sure what you mean here: The width is just the difference between
the position numbers; see the thread "Flex/Bison standardized diagnostics".

The suggestion was that also a pointer to the beginning of the line should
be provided (unless the segment is too big to fit into the Flex buffer), so
any re-computations would be easy. (But for tabs, also see below.)

>Implement it if you wish.  But it will not be used, because it is
>- dependent upon the system
>  (yet I am surprised that Hans did not complain about yylineno which
>   stupidly thinks eol is `\n' while everybody knows modern OSes use
>   `\r'...)

One reads the streams as text and not as binary, in which case \r is
translated into \n according to the C/C++ standards. The same thing happens
under DOS/MSOS. There is only a problem when moving text files between
UNIX/MacOS/MSOS without translating the newlines.

My C/C++ text lib also translates \n into \r, so a wrote a patch overriding
it, sending \n to \n in text mode. It means that \r\n will be read as two
newlines, but that does not bother me much. I do not translate text files
to Mac format anymore, as my editors and compilers can handle all types of
newlines.

I think that it would be interesting with an option that allows any of \n,
\r, \r\n to be read as a newline, so that streams can be read as binary.

Unicode has a special line separator symbol, so then tha issue may go away.

>- dependent upon the lexical structure of the input (tell me how you
>  expect to handle tabs: single char?  Walks by 4 spaces? by 8?)

Tabs is a non-issue in the proposed set-up, that is, programmers will have
to implement it on their own.

But one might give the implementation of a tab = whitespace equivalence a
thought, as that is used in some programs (for example, in Haskell). Then
one must have a variable telling how many whites there is on a tab. One
extra problem is that C/C++ whitespace [[:space:]]+ has several different
tabs in it. Unicode, I think, has several fixed width tabs, so the issue
may pop up again in that context.

But my own experience is that one should avoid giving such an equivalence
in parsers, as the number may vary in usage.

>- dependent upon the use
>
>- dependent upon the taste of the user.

There are no such dependencies: The proposal provides the information
needed for any programmer to customize.

>All we need a means to implement what we need. We already have it.

It does not seem so, not even for you: You posted an example with extremely
complicated code for something which should be fairly transparent to the
programmer.

>All the rest is pushing an additional burden onto you, for extremely
>limited added value.

Isn't that for the Flex developers remaining to see? If one is curious
about a problem, perhaps giving it an attempt, then that will tell exactly
what is required to implement it.

>In addition, it is more expensive to have Flex do it for us: it
>duplicates the walks through yytext.

What do you have in your mind here? No such backtracking is needed in any
of the discussion that we have had here.

Clearly, if Flex implements it properly, that will be the most efficient
variation, as Flex has access to things that the user does not.

The problem appears to be that Flex has some quick fixes (as the REJECT
code implementation) which slows it down.

>John> struct location { int line, column; /* optionally tracked by
>John> flex */ YY_LOCATION_MEMBERS /* user-defined */ };
>
>This is wrong for some applications where it is the width (number of
>chars, including the possible other eol) that matters.  I'm sure Hans
>can make long comments about that.

Not unless you tell me what to comment about: I do not know immediately how
to elaborate on the difference of the position numbers.


  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]