bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issues caused by array elements passed to a function: crash dumps, reads


From: mekanofox
Subject: Issues caused by array elements passed to a function: crash dumps, reads-after-free, internal errors, others
Date: Sun, 05 Jan 2025 14:00:49 +0000
User-agent: Purely Mail via Roundcube/1.6.8

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g3 -gdwarf-5  -DNDEBUG
uname output: Linux linbox 6.12.6-100.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 19 23:18:14 UTC 2024 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Gawk Version: 5.3.60

Attestation 1:
I have read https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
        Yes

Attestation 2:
        I have not modified the sources before building gawk.
        True

Description:
These are cases that all start by passing an array element to a function; the array element is brand new (an untyped/Node_elem_new). The test scripts used to reproduce the issues can be split in roughly two categories:

1. the function makes the passed argument an array; later on, the scripts attempt to access the newly created subarray in a scalar context.

Yes, attempting to use an array as a scalar is a fatal script error and gawk will "shortly exit anyway"; however, on its way out it should print an informative message about the location, the array involved (plus the parameter name if inside a function). Sometimes though is doing a segfault/core dump, which is a tad excessive. (example below)

2. the function deletes the original/global array, then accesses in various ways the argument (which is the former element of the now deleted array).

In this group of tests the now "orphan" argument is left in an improper state and sometimes using it later can cause an internal error due to an unexpected node type; in some cases a NODE structure is prematurely freed (DEREFed); later on, the code will read from the freed memory; the effects range from internal errors to incorrect results.

For example, from the first category, a core dump can be triggered with this very short script (it is actually the first test case, ar2fn_elnew_sc, a greatly simplified version of a real life script):

        function f(y) {
                y[3] = 14
                y  # BOOM
        }
        BEGIN { f(a[10]) }

The crash is caused by a strlen stepping on a NULL vname pointer; the "vname" should be the name/string form of the index in the array/expression wrongly used as a scalar; it should be used in the error message saying something like "attempt to use array `y (from a["10"])' in a scalar context". That in turn requires correct "parent_array" and/or "orig_array" pointers. (see also the 2nd test case).

Repeat-By:
See the test/ar2fn_*.awk test cases and their associated .ok files in the attached patch.

To add ONLY the test cases and see the bugs manifest before the actual fix is in... Do something like this (maybe in a separate branch to keep things tidy): git apply --whitespace=nowarn --include='test/*' path/to/the/patch/file/ar2fn.diff ... then do a 'make check'. (well, maybe build gawk beforehand if it's not already built). Or just slice and dice the diff to keep only the tests and their makefiles.

        The incorrect outcomes for each test are as follows:

        ar2fn_elnew_sc:         causes a core dump

ar2fn_elnew_sc2: malformed error message: "fatal: attempt to use array `(null)' in a scalar context"

ar2fn_fmod: use-after-free, accessing a Node_var and its "var_value" *after* it has been freed, so all bets are off: depending on how gawk was built (debug/release) and/or Valgrind options, it can crash, other times it gives a wrong type in result.

ar2fn_unxptyp_aref: "fatal: internal error: file interpret.h, line 265: unexpected parameter type Node_array_ref"

ar2fn_unxptyp_val: NODE left in improper state; "fatal: internal error: file interpret.h, line 265: unexpected parameter type Node_val"

Note: no fuzzers were stressed during the making of these scripts. Also, no synthesizers. ;)

Fix:
See attached ar2fn.diff for a fix. Some details, with a bit of background first:

A newly born Node_elem_new could become a scalar or a (sub)array. But until it knows exactly what it will be when it "grows up", it should be or pretend to be (potentially) both: it should carry around -- or have a way to access -- all the information needed for both outcomes.

However, the way it is treated in its childhood does not allow it to maintain the memory of its parent (this starts to sound like a soap opera or a bildungsroman...); in some cases, by the time it ends up in a situation to turn into a subarray it will miss key data that a subarray would need: a pointer to its parent ("parent_array") and the "vname", a string representation of its index in that parent array.

The fix saves this information in certain fields of the NODE structure and preserves them while the new element is passed around. If the NODE will turn into a scalar, that information will be freed/discarded. But if it will become a subarray, the saved information will be moved into the proper fields so that it can be used in case is needed (mainly by the "array_vname" function).

I tried to leave existing code as much as possible the way it was and only add/augment around it. Also, in two places there are conditions that do not seem to be triggered, or at least I was not able to come up with a test to do it; thus arguably they could be turned into asserts, just in case. But out of an abundance of caution I've left them there; at worst they will be very small pieces of dead code. One such place IIRC is the prev_array->valref test in "adjust_param_node" (array.c). The 2nd is in gawkapi.c, the additional test towards the end of "api_sym_update" to reset a Node_elem_new; I feel a new element node cannot reach that point, but I could be wrong; I can dig deeper in that area when I'll have more tuits -- the round kind.

The fix passes the test suite, without leaks or other memory issues reported from Valgrind ("make valgrind" shows 3 "broken pipe"-style errors, but that's unrelated, the same thing is happening with an official master branch build).

Attachment: ar2fn.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]