[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Handling of \r
From: |
Akim Demaille |
Subject: |
Handling of \r |
Date: |
Mon, 9 Sep 2019 18:46:18 +0200 |
Hi Paul,
In d8d3f94a993ce890baae68bf9da7ded29f9f8d76 (2002 :-), you introduced
no_cr_read in the grammar scanner: any lone \r is treated as a \n.
Today, because the diagnostics read only \n as "end of line", there's an offset
in the quoted lines.
$ cat -vn /tmp/f.y
1 %token FOO^M ""
2 FOO
3 %%
4 exp: FOO
$ LC_ALL=C bison /tmp/f.y
/tmp/f.y:3.2-4: warning: symbol FOO redeclared [-Wother]
3 | %%
| ^~~
Worse yet, because I was no cautious enough, sometimes we get in a never ending
loop calling getc waiting for a \n to come, but we're stuck on getting EOF.
Both issues are easy to fix.
Yet I'm not so happy with lone \r being treated as an end-of-line: that's not
what Emacs does (by default I guess; in my case, I see ^M), GNU Coreutils do
(e.g., with cat -n, wc -l), or GNU Sed (with sed -n 2p for instance).
Unfortunately, that's what GCC and Clang both do though: on something like
$ cat -An /tmp/foo.c
1 const char *foo = ""^M;$
2 intt i;$
they report an error in line 3, not 2:
$ clang-mp-7.0 /tmp/foo.c
/tmp/foo.c:3:1: error: unknown type name 'intt'; did you mean 'int'?
intt i;
^~~~
int
1 error generated.
$ gcc-mp-9 /tmp/foo.c
/tmp/foo.c:3:1: error: unknown type name 'intt'; did you mean 'int'?
AFAICT, the GCS don't specify the required behavior.
I personally prefer treating a lone \r as a regular character, as it's more
consistent with what my tools show me. And I think it's a problem that GCC and
Emacs disagree, so maybe the GCS should decide.
But in the case of Bison, WDYT today?
- Handling of \r,
Akim Demaille <=