[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Handling of \r

From: Akim Demaille
Subject: Handling of \r
Date: Mon, 9 Sep 2019 18:46:18 +0200

Hi Paul,

In d8d3f94a993ce890baae68bf9da7ded29f9f8d76 (2002 :-), you introduced 
no_cr_read in the grammar scanner: any lone \r is treated as a \n.

Today, because the diagnostics read only \n as "end of line", there's an offset 
in the quoted lines.

$ cat -vn /tmp/f.y
     1  %token FOO^M ""
     2   FOO
     3  %%
     4  exp: FOO
$ LC_ALL=C bison /tmp/f.y
/tmp/f.y:3.2-4: warning: symbol FOO redeclared [-Wother]
    3 | %%
      |  ^~~

Worse yet, because I was no cautious enough, sometimes we get in a never ending 
loop calling getc waiting for a \n to come, but we're stuck on getting EOF.

Both issues are easy to fix.

Yet I'm not so happy with lone \r being treated as an end-of-line: that's not 
what Emacs does (by default I guess; in my case, I see ^M), GNU Coreutils do 
(e.g., with cat -n, wc -l), or GNU Sed (with sed -n 2p for instance).

Unfortunately, that's what GCC and Clang both do though: on something like

$ cat -An /tmp/foo.c
     1  const char *foo = ""^M;$
     2  intt i;$

they report an error in line 3, not 2:

$ clang-mp-7.0 /tmp/foo.c
/tmp/foo.c:3:1: error: unknown type name 'intt'; did you mean 'int'?
intt i;
1 error generated.
$ gcc-mp-9 /tmp/foo.c
/tmp/foo.c:3:1: error: unknown type name 'intt'; did you mean 'int'?

AFAICT, the GCS don't specify the required behavior.

I personally prefer treating a lone \r as a regular character, as it's more 
consistent with what my tools show me.  And I think it's a problem that GCC and 
Emacs disagree, so maybe the GCS should decide.

But in the case of Bison, WDYT today?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]