bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues caused by array elements passed to a function: crash dumps, r


From: Aharon Robbins
Subject: Re: Issues caused by array elements passed to a function: crash dumps, reads-after-free, internal errors, others
Date: Wed, 05 Feb 2025 14:02:21 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hi Christian,

Thank you VERY much for this contribution.  Now that your paperwork
has come through, I have integrated this into the code.

I have made some additional stylistic changes, but I'm doing that
in a separate Git commit.  I will push everything to the git repo
shortly.

Your changes were nicely done.  I hope you'll feel like continuing
to contribute!

Thanks,

Arnold

> Date: Sun, 05 Jan 2025 14:00:49 +0000
> To: bug-gawk@gnu.org
> Subject: Issues caused by array elements passed to a function: crash dumps,
>  reads-after-free, internal errors, others
>
> Configuration Information [Automatically generated, do not change]:
> Machine: x86_64
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS: -g3 -gdwarf-5  -DNDEBUG
> uname output: Linux linbox 6.12.6-100.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC 
> Thu Dec 19 23:18:14 UTC 2024 x86_64 GNU/Linux
> Machine Type: x86_64-pc-linux-gnu
>
> Gawk Version: 5.3.60
>
> Attestation 1:
>       I have read 
> https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
>       Yes
>
> Attestation 2:
>       I have not modified the sources before building gawk.
>       True
>
> Description:
>       These are cases that all start by passing an array element to a 
> function; the array element is brand new (an untyped/Node_elem_new). The 
> test scripts used to reproduce the issues can be split in roughly two 
> categories:
>
>       1. the function makes the passed argument an array; later on, the 
> scripts attempt to access the newly created subarray in a scalar 
> context.
>
>               Yes, attempting to use an array as a scalar is a fatal script 
> error 
> and gawk will "shortly exit anyway"; however, on its way out it should 
> print an informative message about the location, the array involved 
> (plus the parameter name if inside a function). Sometimes though is 
> doing a segfault/core dump, which is a tad excessive. (example below)
>
>       2. the function deletes the original/global array, then accesses in 
> various ways the argument (which is the former element of the now 
> deleted array).
>
>               In this group of tests the now "orphan" argument is left in an 
> improper state and sometimes using it later can cause an internal error 
> due to an unexpected node type; in some cases a NODE structure is 
> prematurely freed (DEREFed); later on, the code will read from the freed 
> memory; the effects range from internal errors to incorrect results.
>
>       For example, from the first category, a core dump can be triggered with 
> this very short script (it is actually the first test case, 
> ar2fn_elnew_sc, a greatly simplified version of a real life script):
>
>       function f(y) {
>               y[3] = 14
>               y  # BOOM
>       }
>       BEGIN { f(a[10]) }
>
>       The crash is caused by a strlen stepping on a NULL vname pointer; the 
> "vname" should be the name/string form of the index in the 
> array/expression wrongly used as a scalar; it should be used in the 
> error message saying something like "attempt to use array `y (from 
> a["10"])' in a scalar context". That in turn requires correct 
> "parent_array" and/or "orig_array" pointers. (see also the 2nd test 
> case).
>
> Repeat-By:
>       See the test/ar2fn_*.awk test cases and their associated .ok files in 
> the attached patch.
>
>       To add ONLY the test cases and see the bugs manifest before the actual 
> fix is in... Do something like this (maybe in a separate branch to keep 
> things tidy):
>                       git apply --whitespace=nowarn --include='test/*' 
> path/to/the/patch/file/ar2fn.diff
>               ... then do a 'make check'. (well, maybe build gawk beforehand 
> if it's 
> not already built).
>       Or just slice and dice the diff to keep only the tests and their 
> makefiles.
>
>       The incorrect outcomes for each test are as follows:
>
>       ar2fn_elnew_sc:         causes a core dump
>
>       ar2fn_elnew_sc2:        malformed error message: "fatal: attempt to use 
> array 
> `(null)' in a scalar context"
>
>       ar2fn_fmod:             use-after-free, accessing a Node_var and its 
> "var_value" 
> *after* it has been freed, so all bets are off: depending on how gawk 
> was built (debug/release) and/or Valgrind options, it can crash, other 
> times it gives a wrong type in result.
>
>       ar2fn_unxptyp_aref:     "fatal: internal error: file interpret.h, line 
> 265: 
> unexpected parameter type Node_array_ref"
>
>       ar2fn_unxptyp_val:      NODE left in improper state; "fatal: internal 
> error: 
> file interpret.h, line 265: unexpected parameter type Node_val"
>
>       Note: no fuzzers were stressed during the making of these scripts. 
> Also, no synthesizers. ;)
>
> Fix:
>       See attached ar2fn.diff for a fix. Some details, with a bit of 
> background first:
>
>       A newly born Node_elem_new could become a scalar or a (sub)array. But 
> until it knows exactly what it will be when it "grows up", it should be 
> or pretend to be (potentially) both: it should carry around -- or have a 
> way to access -- all the information needed for both outcomes.
>
>       However, the way it is treated in its childhood does not allow it to 
> maintain the memory of its parent (this starts to sound like a soap 
> opera or a bildungsroman...); in some cases, by the time it ends up in a 
> situation to turn into a subarray it will miss key data that a subarray 
> would need: a pointer to its parent ("parent_array") and the "vname", a 
> string representation of its index in that parent array.
>
>       The fix saves this information in certain fields of the NODE structure 
> and preserves them while the new element is passed around. If the NODE 
> will turn into a scalar, that information will be freed/discarded. But 
> if it will become a subarray, the saved information will be moved into 
> the proper fields so that it can be used in case is needed (mainly by 
> the "array_vname" function).
>
>       I tried to leave existing code as much as possible the way it was and 
> only add/augment around it.
>       Also, in two places there are conditions that do not seem to be 
> triggered, or at least I was not able to come up with a test to do it; 
> thus arguably they could be turned into asserts, just in case.
>       But out of an abundance of caution I've left them there; at worst they 
> will be very small pieces of dead code.
>       One such place IIRC is the prev_array->valref test in 
> "adjust_param_node" (array.c).
>       The 2nd is in gawkapi.c, the additional test towards the end of 
> "api_sym_update" to reset a Node_elem_new; I feel a new element node 
> cannot reach that point, but I could be wrong; I can dig deeper in that 
> area when I'll have more tuits -- the round kind.
>
>       The fix passes the test suite, without leaks or other memory issues 
> reported from Valgrind ("make valgrind" shows 3 "broken pipe"-style 
> errors, but that's unrelated, the same thing is happening with an 
> official master branch build).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]