Re: RFC: "count_chars" function for locations

bug-bison
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: "count_chars" function for locations

From:	Akim Demaille
Subject:	Re: RFC: "count_chars" function for locations
Date:	Sat, 15 Jan 2022 10:41:41 +0100
Hi Frank!

> Le 11 janv. 2022 à 01:51, Frank Heckenbach <f.heckenbach@fh-soft.de> a écrit :
> 
> Akim Demaille wrote:
> 
>>>   void count_chars (const char *s, size_t n)
>> 
>> I understand the need, and agree with the proposal.  However I'm
>> concerned by the fact that this gives a meaning to 'columns'.  Currently,
>> it is unspecified: it could be just bytes, or actual characters,
> 
> ... or character width (as Bison uses now in its own scanner).

Indeed.

> Then again, the simple form as given in calc++/scanner.ll (also in
> the manual) has the same issues:
> 
>  # define YY_USER_ACTION  loc.columns (yyleng);
> 
> Actually, I used something like this in my own code until I had
> multi-line tokens, when I extended it to support newlines, unaware
> of these other issues.
> 
> You might want to add at least a comment about these issues in this
> and other examples, in the manual, and in location.cc.

I'm not sure what and where to write in location.cc.  But how about
the appended patch?


> BTW, when I added tab handling to my function, I noticed a possible
> issue with the position interface. I had to do something like this:
> 
>  case '\t':
>    end.column += tab_width - end.column % tab_width;
>    break;
> 
> since the columns() function can only add columns, not query the
> current column as needed to compute tabs. It's not a real problem
> since column is a public field, but it feels a bit like breaking the
> abstraction. You may or may not care about this (just please don't
> make column non-public in the future :).

Nah, it's public on purpose.  I don't plan in over-engineering
this class :)


>> Maybe something like %code location and %code position.  WDYT?  One
>> issue would then be that if someone introduced something new in position
>> or location, and we later add the same name, we would break the
>> user's code :(
> 
> If I understand you right, this would only enable one to put one's
> own implementation of such a function in the parser, rather than the
> scanner file. Of course, one can't add a function to yy::location in
> the scanner file, but one can use a free function and pass the
> location object to it, or subclass yy::location with this function
> added easily. And the scanner file is where this code belongs
> logically more than the parser file (if not the location class in
> the first place). So I don't think it's worth the effort to add this
> to Bison.

Agreed.  The user can easily plug her own definition of locations,
that's good enough as it is.

Cheers!



commit 5e2e1f03fa4e6bae59ad72a8ee10b4344457b8d8
Author: Akim Demaille <akim.demaille@gmail.com>
Date:   Sat Jan 15 10:28:16 2022 +0100

    doc: explain why location's "column" are defined vaguely
    
    Suuggested by Frank Heckenbach.
    <https://lists.gnu.org/r/bug-bison/2022-01/msg00000.html>
    
    * doc/bison.texi (Location Type): Explain why location's "column" are
    defined vaguely.
    Show tab handling in ltcalc and calc++.
    * examples/c/bistromathic/parse.y: Show tab handling.
    * examples/c++/calc++/calc++.test,
    * examples/c/bistromathic/bistromathic.test:
    Check tab handling.

diff --git a/doc/bison.texi b/doc/bison.texi
index 2dab6c49..de6021d8 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -2365,6 +2365,8 @@ @node Location Tracking Calc
 * Ltcalc Lexer::           The lexical analyzer.
 @end menu
 
+See @ref{Tracking Locations} for details about locations.
+
 @node Ltcalc Declarations
 @subsection Declarations for @code{ltcalc}
 
@@ -2488,7 +2490,7 @@ @node Ltcalc Lexer
 @group
   /* Skip white space. */
   while ((c = getchar ()) == ' ' || c == '\t')
-    ++yylloc.last_column;
+    yylloc.last_column += c == '\t' ? 8 - ((yylloc.last_column - 1) & 7) : 1;
 @end group
 
 @group
@@ -4751,6 +4753,33 @@ @node Location Type
 initialization), use the @code{%initial-action} directive.  @xref{Initial
 Action Decl}.
 
+@sp 1
+
+@cindex column
+The meaning of ``column'' is deliberately left vague since there are several
+options, depending on the use cases.
+
+With multibyte input (say UTF-8), simply counting the number of bytes does
+not match character positions on the screen.  One needs advanced functions
+mapping multibyte characters to their visual width (see for instance
+Gnulib's @code{mbswidth} and @code{mbsnwidth} functions).  Tabulation
+characters probably need a dedicated implementation, to match the ``go to
+next multiple of 8'' behavior.
+
+However to quote input in error messages, as @command{bison} does:
+
+@example
+@group
+1.10-12: @derror{error}: invalid identifier: ‘3.8’
+    1 | %require @derror{3.8}
+      |          @derror{^~~}
+@end group
+@end example
+
+@noindent
+then byte positions are more handy.  So in some cases, tracking both visual
+character position @emph{and} byte position is the best option.  This is
+what @command{bison} does.
 
 @node Actions and Locations
 @subsection Actions and Locations
@@ -13745,8 +13774,14 @@ @node Calc++ Scanner
 @example
 @group
 %@{
+  // Take 8-space tabulations into account.
+  void add_columns (yy::location& loc, const char *buf, int bufsize)
+  @{
+    for (int i = 0; i < bufsize; ++i)
+      loc.columns (buf[i] == '\t' ? 8 - ((loc.end.column - 1) & 7) : 1);
+  @}
   // Code run each time a pattern is matched.
-  # define YY_USER_ACTION  loc.columns (yyleng);
+  #define YY_USER_ACTION  add_columns (loc, yytext, yyleng);
 %@}
 @end group
 %%
diff --git a/examples/c++/calc++/calc++.test b/examples/c++/calc++/calc++.test
index 290e6c24..a3ab5acb 100755
--- a/examples/c++/calc++/calc++.test
+++ b/examples/c++/calc++/calc++.test
@@ -50,6 +50,21 @@ EOF
 run 1 'err: -:2.1: syntax error, unexpected end of file, expecting ( or 
identifier or number'
 
 
+# Check handling of tabs.
+cat >input <<EOF
+        *1
+EOF
+run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or 
number'
+cat >input <<EOF
+       *2
+EOF
+run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or 
number'
+cat >input <<EOF
+               *3
+EOF
+run 1 'err: -:1.9: syntax error, unexpected *, expecting ( or identifier or 
number'
+
+
 # LAC finds many more tokens.
 cat >input <<EOF
 a := 1
diff --git a/examples/c/bistromathic/bistromathic.test 
b/examples/c/bistromathic/bistromathic.test
index 2db5dcfe..921e17ab 100755
--- a/examples/c/bistromathic/bistromathic.test
+++ b/examples/c/bistromathic/bistromathic.test
@@ -366,3 +366,26 @@ err: 1.15: syntax error: expected - or ( or number or 
function or variable befor
 err:     1 | (1++2) + 3 +  ''
 err:       |               ^
 '
+
+# Check handling of literal tabs.  "Escape" them with a C-v, so that
+# they are not processed as completion requests.
+cat >input<<EOF
+        *1
+      *2
+      *3
+EOF
+# readline processes the tabs itself, and replaces then with spaces.
+run -n 0 '>         *1
+>       *2
+>       *3
+> ''
+err: 1.9: syntax error: expected end of file or - or ( or exit or number or 
function etc., before *
+err:     1 |         *1
+err:       |         ^
+err: 2.9: syntax error: expected end of file or - or ( or exit or number or 
function etc., before *
+err:     2 |   *2
+err:       |         ^
+err: 3.9: syntax error: expected end of file or - or ( or exit or number or 
function etc., before *
+err:     3 |           *3
+err:       |         ^
+'
diff --git a/examples/c/bistromathic/parse.y b/examples/c/bistromathic/parse.y
index 5c1e6055..8094cb9f 100644
--- a/examples/c/bistromathic/parse.y
+++ b/examples/c/bistromathic/parse.y
@@ -308,14 +308,15 @@ yylex (const char **line, YYSTYPE *yylval, YYLTYPE 
*yylloc,
 {
   int c;
 
-  // Ignore white space, get first nonwhite character.
+  // Get next character, ignore white spaces.
   do {
     // Move the first position onto the last.
     yylloc->first_line = yylloc->last_line;
     yylloc->first_column = yylloc->last_column;
-
-    yylloc->last_column += 1;
     c = *((*line)++);
+    // Tab characters go to the next column multiple of 8.
+    yylloc->last_column +=
+      c == '\t' ? 8 - ((yylloc->last_column - 1) & 7) : 1;
   } while (c == ' ' || c == '\t');
 
   switch (c)
[Prev in Thread]
Current Thread
[Next in Thread]
Re: RFC: "count_chars" function for locations, Frank Heckenbach, 2022/01/10
- Re: RFC: "count_chars" function for locations, Akim Demaille <=
Prev by Date: Re: Bug in glr2.cc
Next by Date: Re: Bison good on Ubuntu 8.04
Previous by thread: Re: RFC: "count_chars" function for locations
Next by thread: Bison good on Ubuntu 8.04
Index(es):
- Date
- Thread