[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16586: grep: infinite loop in grep -P on some files with invalid UTF
From: |
Jim Meyering |
Subject: |
bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences |
Date: |
Mon, 3 Feb 2014 13:34:14 -0800 |
On Wed, Jan 29, 2014 at 1:43 AM, Santiago <address@hidden> wrote:
> Package: grep
> Version: 2.16
> Severity: important
>
> Hi there,
>
> I forward this bug from debian's BTS. Last changes in -P brought another
> problem. I've confirmed this behavior on last debian package:
>
> ----- Forwarded message from Vincent Lefevre <address@hidden> -----
>
> [snip]
>
>
> grep -P loops on some files with invalid UTF-8 sequences, e.g.
>
> $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
>
> (the infinite loop is interrupted here by a broken pipe due to
> the "head").
>
> It seems that the fix of
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472
Thanks for the heads-up. That appears to be a problem with pcre.
I've just build grep (git head) against pcre (git head), and adjusted
your example slightly and built with gcc's address sanitizer mode.
Now, libpcre gets an internal segfault:
$ printf "\xe9\n\xab\n" > k; src/grep -P 'e|.?z' k
ASAN:SIGSEGV
=================================================================
==11821==ERROR: AddressSanitizer: SEGV on unknown address
0x62cfffffffff (pc 0x00\
00004f0743 sp 0x7fff6b32f4a0 bp 0x7fff6b32f760 T0)
#0 0x4f0742 in match /w/co/pcre/pcre_exec.c:5943
#1 0x4f26d5 in pcre_exec /w/co/pcre/pcre_exec.c:6941
#2 0x46f421 in Pexecute /w/co/grep/src/pcresearch.c:178
#3 0x4717a3 in do_execute /w/co/grep/src/main.c:1075
#4 0x4717a3 in grepbuf /w/co/grep/src/main.c:1111
#5 0x472249 in grep /w/co/grep/src/main.c:1222
#6 0x472249 in grepdesc /w/co/grep/src/main.c:1476
#7 0x4073ca in main /w/co/grep/src/main.c:2396
#8 0x7f6f21a53cdc in __libc_start_main (/lib64/libc.so.6+0x1ecdc)
#9 0x408a54 (/w/u/w/co/grep/src/grep+0x408a54)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /w/co/pcre/pcre_exec.c:5943 match
==11821==ABORTING
Sorry, but I don't have time to debug further. Quick glance suggests
it is backing up too far:
(gdb) b __asan_report_error
Breakpoint 1 at 0x448c40: file
../../.././libsanitizer/asan/asan_report.cc, line 711.
(gdb) r
Starting program: /w/u/w/co/grep/src/grep -P e\|.\?z k
warning: no loadable sections found in added symbol-file
system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f0743 in match (eptr=0x62cfffffffff "",
ecode=0x60700000df8a "\035zx",
mstart=0x62d00000b002 "\253\n", '\276' <repeats 198 times>...,
offset_top=2, md=0x7fffffffce30, eptrb=0x0, rdepth=0)
at pcre_exec.c:5943
5943 BACKCHAR(eptr);
(gdb) l
5938 {
5939 if (eptr == pp) goto TAIL_RECURSE;
5940 RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
5941 if (rrc != MATCH_NOMATCH) RRETURN(rrc);
5942 eptr--;
5943 BACKCHAR(eptr);
5944 if (ctype == OP_ANYNL && eptr > pp && UCHAR21(eptr)
== CHAR_NL &&
5945 UCHAR21(eptr - 1) == CHAR_CR) eptr--;
5946 }
5947 }
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences,
Jim Meyering <=