bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16586: grep: infinite loop in grep -P on some files with invalid UTF


From: Jim Meyering
Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences
Date: Mon, 3 Feb 2014 13:34:14 -0800

On Wed, Jan 29, 2014 at 1:43 AM, Santiago <address@hidden> wrote:
> Package: grep
> Version: 2.16
> Severity: important
>
> Hi there,
>
> I forward this bug from debian's BTS. Last changes in -P brought another
> problem. I've confirmed this behavior on last debian package:
>
> ----- Forwarded message from Vincent Lefevre <address@hidden> -----
>
> [snip]
>
>
> grep -P loops on some files with invalid UTF-8 sequences, e.g.
>
> $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
>
> (the infinite loop is interrupted here by a broken pipe due to
> the "head").
>
> It seems that the fix of
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472

Thanks for the heads-up.  That appears to be a problem with pcre.
I've just build grep (git head) against pcre (git head), and adjusted
your example slightly and built with gcc's address sanitizer mode.
Now, libpcre gets an internal segfault:

$ printf "\xe9\n\xab\n" > k; src/grep -P 'e|.?z' k
ASAN:SIGSEGV
=================================================================
==11821==ERROR: AddressSanitizer: SEGV on unknown address
0x62cfffffffff (pc 0x00\
00004f0743 sp 0x7fff6b32f4a0 bp 0x7fff6b32f760 T0)
    #0 0x4f0742 in match /w/co/pcre/pcre_exec.c:5943
    #1 0x4f26d5 in pcre_exec /w/co/pcre/pcre_exec.c:6941
    #2 0x46f421 in Pexecute /w/co/grep/src/pcresearch.c:178
    #3 0x4717a3 in do_execute /w/co/grep/src/main.c:1075
    #4 0x4717a3 in grepbuf /w/co/grep/src/main.c:1111
    #5 0x472249 in grep /w/co/grep/src/main.c:1222
    #6 0x472249 in grepdesc /w/co/grep/src/main.c:1476
    #7 0x4073ca in main /w/co/grep/src/main.c:2396
    #8 0x7f6f21a53cdc in __libc_start_main (/lib64/libc.so.6+0x1ecdc)
    #9 0x408a54 (/w/u/w/co/grep/src/grep+0x408a54)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /w/co/pcre/pcre_exec.c:5943 match
==11821==ABORTING

Sorry, but I don't have time to debug further.  Quick glance suggests
it is backing up too far:

(gdb) b __asan_report_error
Breakpoint 1 at 0x448c40: file
../../.././libsanitizer/asan/asan_report.cc, line 711.
(gdb) r
Starting program: /w/u/w/co/grep/src/grep -P e\|.\?z k
warning: no loadable sections found in added symbol-file
system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f0743 in match (eptr=0x62cfffffffff "",
ecode=0x60700000df8a "\035zx",
    mstart=0x62d00000b002 "\253\n", '\276' <repeats 198 times>...,
offset_top=2, md=0x7fffffffce30, eptrb=0x0, rdepth=0)
    at pcre_exec.c:5943
5943              BACKCHAR(eptr);
(gdb) l
5938              {
5939              if (eptr == pp) goto TAIL_RECURSE;
5940              RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
5941              if (rrc != MATCH_NOMATCH) RRETURN(rrc);
5942              eptr--;
5943              BACKCHAR(eptr);
5944              if (ctype == OP_ANYNL && eptr > pp  && UCHAR21(eptr)
== CHAR_NL &&
5945                  UCHAR21(eptr - 1) == CHAR_CR) eptr--;
5946              }
5947            }





reply via email to

[Prev in Thread] Current Thread [Next in Thread]