bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] mixed LF/CRLF scripts: incorrect line numbers


From: arnold
Subject: Re: [bug-gawk] mixed LF/CRLF scripts: incorrect line numbers
Date: Tue, 08 May 2018 09:35:25 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

Thank you for taking the time to submit a bug report.

I finally had a minute to examine this.  The problem is in your script
and in your assumption that a lone CR ends a source code line.

Gawk comes from a Unix heritage where LF ends lines, and this is backed
up by POSIX.  In order to make life easier (if not totally easy) for people
on Windows systems, gawk simply treats CR as whitespace.  Thus, a line
like:

        foo = bar # comment 1 ^M bar++ # comment 2

isn't two source code lines; it's only one line with an unusual whitespace
character in the middle. In fact, all 3 open source awks act the same way:

        $ cat > x.awk
        BEGIN {
                foo = 1 # comment ^M foo++ # another comment
                print foo
        }
        ^D
        $ gawk -f x.awk 
        1
        $ nawk -f x.awk 
        1
        $ mawk -f x.awk 
        1

The ^M is a literal CR character.

Making gawk treat a lone CR like LF would be a lot of work for very
little gain; you should let your text editor help you ensure that the
line endings in your source program are consistent.

Thanks,

Arnold

"Jannick" <address@hidden> wrote:

> Hi All,
>
> I have come across something weird from the EOL hell on my Windows box
> using Cygwin's gawk in -D mode (debug). Gawk did not show the correct
> line numbers which caused confusion in break statements.
> 
> Drilling things down I found out this is because the script file is of
> mixed EOL type (LF and CRLF).  The culprit lines are those with LF only
> (at least on my CRLF Windows machine using the cygwin's LF gawk version).
> 
> In the syntax error report gawk tells us the line numbers of the processed
> script it thinks they are OK and supposedly uses throughout the program
> (syntax errors, debug etc.). So I used the attached file with a small
> example.
>
> The sample file is CRLF with one single CR line causing the issue. Running
> it produces
>
> gawk: ./gawk-line.awk:2:        line2 .= 2 # shown with correct line
> gawk: ./gawk-line.awk:2:              ^ syntax error
> gawk: ./line4 .= 4 # shown with line 3!= 3 # not shown, here the comment-tag
> syntactically meant as EOL does not help
> gawk: ./gawk-line.awk:3:              ^ syntax error
> gawk: ./gawk-line.awk:4:        line5 .= 5 # shown with line 4!
> gawk: ./gawk-line.awk:4:              ^ syntax error
>
> The problem starts in line 3. Before that gawk and I agreed so far.
> - line 3 is ignored, the CR causes the overlay in the syntax error
> of the next line, the comment tag does not stop gawk from parsing
> beyond the CR, I think
> - line 4 gawk thinks it is line 3, but we think it is line 4
> ... and so on.
>
> I included the syntax error report in the attached file.
>
> I think this is a bug, albeit it lives in the CR/CRLF realm where the
> best solution would not be to point to a private compilation of gawk on
> my Windows notebook to avoid cygwin's executable and CR confusion.
>
> Please give me a shout if I you need any more information or I should run
> any tests here on my machine.
>
> Best - and sorry for being a bit lengthy in this posting,
> J.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]