bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Records longer than INT_MAX mishandled


From: Miguel Pineiro Jr.
Subject: Records longer than INT_MAX mishandled
Date: Mon, 03 May 2021 21:22:27 -0400
User-agent: Cyrus-JMAP/3.5.0-alpha0-403-gbc3c488b23-fm-20210419.005-gbc3c488b

Hello, gawk devs.

gawk mishandles records longer than INT_MAX when get_a_record stuffs their 
size_t length in an int (io.c:4081: `retval = recm.len`).

All of the following examples are paired, first a success using a record of 
length INT_MAX, then a failure using INT_MAX + 1.


In the main i/o loop, records vanish when their corrupted length is negative, 
since inrec doesn't consider a negative value a valid record.

$ gawk 'BEGIN {printf("%2147483647s\n", "a")}' | gawk 'END {print NR}'
1
$ gawk 'BEGIN {printf("%2147483648s\n", "a")}' | gawk 'END {print NR}'
0


In getline (do_getline/do_getline_redir), if the corrupted length is equal to 
EOF, it will trigger a silent bypass of the rest of the file. More likely, some 
other value will mislead buffer memory management routines and crash gawk.

This bare getline fails fatally in set_record's buffer resizing loop, when it 
gives up trying to accomodate what it thinks is a humongous record 
(field.c:284: `cnt >= databuf_size` promotes a negative int cnt to unsigned 
long).

$ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline} END {print 
NR}'
2
$ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline} END {print 
NR}'
gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: input record too large


This getline var dies in make_string (make_str_node) from a corrupted 
allocation request:

$ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline var} END 
{print NR}'
2
$ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline var} END 
{print NR}'
gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: node.c:415:make_str_node: 
r->stptr: cannot allocate -2147483647 bytes of memory: Cannot allocate memory


If INT_MAX is deemed sufficient, despite the use of capacious size_t i/o 
buffers, here's a diff.

diff --git a/io.c b/io.c
index 91c94d9b..4e777d75 100644
--- a/io.c
+++ b/io.c
@@ -4026,6 +4026,9 @@ get_a_record(char **out,        /* pointer to pointer to 
data */
                        iop->dataend += iop->count;
        }
 
+       if (recm.len > INT_MAX)
+               fatal(_("input record length too large to return"));
+
        /* set record, RT, return right value */
 
        /*



reply via email to

[Prev in Thread] Current Thread [Next in Thread]