[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Multithreaded sort hangs on Solaris
From: |
McFarland, Jeffrey |
Subject: |
RE: Multithreaded sort hangs on Solaris |
Date: |
Wed, 13 Mar 2013 14:18:24 +0000 |
Here are the values from another sort that has been running for over 12 hours
now. This time that second argument (number of threads) looks fine in all
three cases. And this time there are no zombie threads.
>: pstack 20632
20632: /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/
----------------- lwp# 1 / thread# 1 --------------------
ffffffff7eadc810 lwp_wait (f2, ffffffff7fffea9c)
ffffffff7ead4d74 _thrp_join (f2, 0, 0, 1, ffffffff7fffeca0, ffffffff7fffea9c)
+ 38
000000010000f2f4 sortlines (110137e90, 8, 7194a, 11015bfe0, ffffffff7fffeca0,
100136240) + 174
0000000100010144 sort (100137cd0, 1, ffffffff7ffff660, 8, ffffffff7fffeeac,
ffffffff7ed00200) + 2f0
0000000100012bf4 main (13, ffffffff7ffff1f8, ffffffff7ffff298, 100136ca8,
100000000, ffffffff7ed00200) + 21cc
0000000100004ca4 _start (0, 0, 0, 0, 0, 0) + 7c
----------------- lwp# 242 / thread# 242 --------------------
ffffffff7eadc810 lwp_wait (f4, ffffffff7e1fbd2c)
ffffffff7ead4d74 _thrp_join (f4, 0, 0, 1, ffffffff7fffeca0, ffffffff7e1fbd2c)
+ 38
000000010000f2f4 sortlines (110137e90, 4, 7194a, 11015c050, ffffffff7fffeca0,
100136240) + 174
000000010000f168 sortlines_thread (ffffffff7fffeb60, 1fc000, 0, 0, 10000f104,
0) + 64
ffffffff7ead8778 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 244 / thread# 244 --------------------
ffffffff7ead8818 lwp_park (0, 0, 0)
000000010000e710 lock_node (11015c360, 10f691fb0, ffffffff7ec4a300,
ffffffff7fffecac, ffffffff7ed00a00, 0) + 14
000000010000efbc queue_check_insert_parent (ffffffff7fffeca0, 11015c3d0,
100136240, 1101597dd, ffffffff7ed00a00, 1c00) + 2c
000000010000f0e8 merge_loop (ffffffff7fffeca0, 7194a, 100136240, 1101597dd,
ffffffff7eacff0c, 3) + 90
000000010000f43c sortlines (110137e90, 2, 7194a, 11015c0c0, ffffffff7fffeca0,
100136240) + 2bc
000000010000f168 sortlines_thread (ffffffff7e1fbdf0, 1fc000, 0, 0, 10000f104,
0) + 64
ffffffff7ead8778 _lwp_start (0, 0, 0, 0, 0, 0)
>: truss -rall -wall -f -p 20632
20632/1: lwp_wait(242, 0xFFFFFFFF7FFFEA9C) (sleeping...)
20632/244: lwp_park(0x00000000, 0) (sleeping...)
20632/242: lwp_wait(244, 0xFFFFFFFF7E1FBD2C) (sleeping...)
-----Original Message-----
From: Bernhard Voelker [mailto:address@hidden]
Sent: Tuesday, March 12, 2013 12:27 PM
To: McFarland, Jeffrey
Cc: address@hidden
Subject: Re: Multithreaded sort hangs on Solaris
On 03/11/2013 04:47 PM, McFarland, Jeffrey wrote:
>>: sudo pstack 16328
>
> 16328: /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100
> -T /disk/
>
> ----------------- lwp# 1 / thread# 1 --------------------
>
> ffffffff7d4d8818 lwp_park (0, 0, 0)
>
> 0000000100009c74 sortlines (111b56580, 111c56080, ffffffff7fffeab0,
> 10012a321, ffffffff7fffead0, 10012a328) + 514
>
> 000000010000a5cc sortlines (111558380, 2, ffffffff7fffeab0, 1121765e0,
> 0, ffffffff7fffeab0) + e6c
>
> 000000010000a5cc sortlines (111956f80, 4, ffffffff7fffeab0, 112176420,
> 0, ffffffff7fffeab0) + e6c
>
> 000000010000a5cc sortlines (112154760, 8, ffffffff7fffeab0, 1121760a0,
> 1, ffffffff7fffeab0) + e6c
>
> 000000010000c070 sort (10012a740, 0, ffffffff7fffead0, 23, 10012cddd,
> 112154760) + 350
>
> 000000010000e6e8 main (13, ffffffff7ffff148, 0, 10012c220, fffd,
> 10012b1e0) + 1ee8
>
> 00000001000041bc _start (0, 0, 0, 0, 0, 0) + 7c
Hi Jeffrey,
the value of the second argument of topmost sortlines() invocation looks
strange (if pstack shows it right).
Can you attach with GDB and give us the values of the function arguments?
Have a nice day,
Berny
________________________________
This e-mail and files transmitted with it are confidential, and are intended
solely for the use of the individual or entity to whom this e-mail is
addressed. If you are not the intended recipient, or the employee or agent
responsible to deliver it to the intended recipient, you are hereby notified
that any dissemination, distribution or copying of this communication is
strictly prohibited. If you are not one of the named recipient(s) or otherwise
have reason to believe that you received this message in error, please
immediately notify sender by e-mail, and destroy the original message. Thank
You.