[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9321: repeated segfaults sorting large files in 8.12

From: Pádraig Brady
Subject: bug#9321: repeated segfaults sorting large files in 8.12
Date: Sun, 21 Aug 2011 00:28:43 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

On 08/20/2011 09:58 PM, Andras Salamon wrote:
> On Fri, Aug 19, 2011 at 11:54:46PM +0100, Pádraig Brady wrote:
>> On 08/18/2011 03:30 PM, Andras Salamon wrote:
>>> I am seeing repeated (but not reliably repeatable) segmentation faults
>>> sorting datasets in the 100MB-100GB range on a 64-bit Debian system
>>> using GNU sort 8.12 (and also 8.9).  Stack traces seem to indicate
>>> problems during the merge phase, usually when the temporary files
>>> are being combined.
>> Andras, could you give the exact command line your having issue with,
>> and perhaps make sort inputs available too?
> The sort inputs are several-gigabyte-range files containing strings,
> each typically 60 to 140 bytes long, one per line.  There are
> many duplicates, and the first reason to sort is to establish the
> distribution of duplicates.  I would be happy to make available data
> if I could find a reasonably sized file that causes a reproducible
> segfault.  The problem seems easier to reproduce with larger files,
> unfortunately.
>> Do the --batch-size=NMERGE or --compress-program=PROG options change 
>> anything?
> Thanks for the suggestion, I will try forcing smaller batches.
> Compressing batches was something I tried early on with no apparent
> change in likelihood of failure, but it led to much slower runtimes.
>> Also there were temp file handling changes made in 7.2 so could you try:
>> ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz
> Here are some of the relevant-seeming parts of a gdb session for
> coreutils-7.1.

If this happens with 2.5 year old sort, I'd be leaning
towards a local issue.

> (gdb) bt
> #0  0x000000000040e6bc in memcoll (
>     s1=0x7800000005824d58 <Address 0x7800000005824d58 out of bounds>,     
> s1len=15564440312192434243,     s2=0x2b2a1a0 
> "<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.";...,
>  s2len=68)
>     at memcoll.c:50
> #1  0x000000000040af4c in xmemcoll (
>     s1=0x7800000005824d58 <Address 0x7800000005824d58 out of bounds>,     
> s1len=15564440312192434243,     s2=0x2b2a1a0 
> "<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.";...,
>  s2len=68)
>     at xmemcoll.c:43
> #2  0x00000000004059ee in compare (a=0x5b4a7f0, b=0x301dfc0) at sort.c:2059
> #3  0x0000000000406815 in mergefps (files=0x24063e0, ntemps=15, nfiles=15,    
>  ofp=0x23ff8e0, output_file=0x24062ec "/home/a/tmp/sortcOqzkh")
>     at sort.c:2326
> #4  0x000000000040708f in merge (files=0x24063e0, ntemps=16, nfiles=32,     
> output_file=0x0) at sort.c:2567
> #5  0x000000000040766a in sort (files=0x61c660, nfiles=0, output_file=0x0)
>     at sort.c:2699
> #6  0x000000000040908c in main (argc=5, argv=0x7fff149247a8) at sort.c:3425

So the 'a' line struct is corrupted.
a->text =   7800000005824D58
a->length = D800000000000043

Notice the 0x78 and 0xD8.
They should be 0x00.
Now whether this is software or hardware?
It looks like hardware TBH as there are 4
bits incorrectly set in each of those bytes
(which ECC couldn't correct if you have that),
and also each incorrect bit is beside another.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]