bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] Function argument corruption in 4.2.0


From: Eric Pruitt
Subject: [bug-gawk] Function argument corruption in 4.2.0
Date: Sun, 12 Nov 2017 11:55:11 -0800
User-agent: NeoMutt/20170113 (1.7.2)

I've run into a problem with GAWK 4.2.0 where a function argument gets
corrupted. Here's an example showing the output in 4.2.0 compared to
4.1.4:

    mdlint$ gawk --version | head -n1 && gawk -We mdlint test.in -v -r 
label_exists_for_destination
    GNU Awk 4.2.0 (GNU MPFR 3.1.5, GNU MP 6.1.2)
    52: the URI 
"آ|address@hidden@address@hidden@address@hidden|address@hidden@address@hidden@address@hidden"
 points to the same place as the link reference labeled 
"label_exists_for_destination"
    (1)
    mdlint$ /usr/bin/gawk --version | head -n1 && /usr/bin/gawk -We mdlint 
test.in -v -r label_exists_for_destination
    GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
    52: the URI "//#label_exists_for_destination" points to the same place as 
the link reference labeled "label_exists_for_destination"
    (1)
    mdlint$

In addition to 4.1.4, the script also works correctly on mawk 1.3.3,
original-awk (https://packages.debian.org/stretch/original-awk) and
BusyBox AWK v1.22.1. I've observed the corruption building against glibc
and musl libc. Unfortunately I haven't been able to create a simplified
test case or figure out which commit introduced the issue using "git
bisect run", and the mdlint script is ~800 SLOC without the function
documentation comments and ~1500 SLOC with them. I am happy to provide a
copy of the mdlint script and the test case data if someone is willing
to dig into the code. It depends on the cmark binary
(https://github.com/commonmark/cmark), but that could be mocked out
easily enough with something like "cat $OUTPUT_OF_CMARK". For now, here
are two snippets of code surrounding this issue:

        1170          md_link_definitions[label] = n
        1171          $0 = substr(line, RSTART + RLENGTH + 1)
        1172
        1173          if ($1 in uris) {
        1174              link_destination_duplicate(n, label, $1, uris[$1])
        1175          } else if (length($1)) {
        1176              uris[$1] = label
        1177          }
        1178
        1179          if ($1 in md_link_uris) {
    --> 1180              label_exists_for_destination(md_link_uris[$1], $1, 
label)
        1181          }

         287  # A link reference definition for a URI exists.
         288  #
         289  # Arguments:
         290  # - linenos: Numbers of the lines with the problem.
         291  # - destination: Destination of the link.
         292  # - label: Name of the label that refers to the destination.
         293  #
         294  function label_exists_for_destination(linenos, destination, 
label,    n, seen)
         295  {
         296      # This kludge resolves a data corruption issue in GNU Awk 
4.2.0; TODO: root
         297      # cause the problem and report it upstream.
         298      destination = destination ""
         299
         300      $0 = linenos
         301
         302      for (n = 1; n <= NF; n++) {
         303          if ($n in seen) {
         304              continue
         305          }
         306
         307          seen[$n] = 1
         308          report("label_exists_for_destination", 0 + $n,
    -->  309              sprintf("the URI \"%s\" points to the same place as 
the link" \
         310                      " reference labeled \"%s\"",
         311                  destination, label \
         312              ) \
         313          )
         314      }
         315  }

I have omitted the report function. It ultimately just shoves the output
from sprintf into a queue. If the "report" function is replaced with a
"print" statement, the displayed data is still corrupted. This is the
code from my most recent, failed attempt to reproduce the issue in
isolated setting:

    function A(x, y,    n)
    {
        $0 = x

        for (n = 1; n <= NF; n++) {
            print sprintf("y = \"%s\"; n = \"%s\"", y, n)
        }
    }

    function B()
    {
        $0 = "??**??%% //#label_exists_for_destination BBBBBBBBBBBBBBBBBBBBB 
CCCCCCCCCCCCCCCCCCCCCCC"
        if ($2 in array) {
            A(array[$2], $2)
        }
    }

    BEGIN {
        split("", array)
        array["//#label_exists_for_destination"] = "XXXXXXXXXXXXXXXXXXX 
YYYYYYYYYYYYYYYY ZZZZZZZZZZZZZZZZZZZZZZ"
        B()
        exit
    }

Since appending a null string to the "destination" variable mitigates the
corruption, my vague guess is that values or pointers to the split values
generated by assigning "$0" are being modified when they shouldn't be. Any
ideas? Is there any other information I could / should provide?

Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]