parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Would like your thoughts on remote java processes not timing out but bei


From: Alex Muir
Subject: Would like your thoughts on remote java processes not timing out but being replaced with new processes
Date: Thu, 20 Sep 2012 10:52:58 -0400

Hi,

I'm running some java processes on a remote 16 core server with a
timeout of 1500 seconds and with J+1.

I'm getting 21 java processes 17 which are not past the timeout range
fo 25 minutes

28211 ec2-user  20   0 4355m 170m 9948 S 145.4  0.3   0:07.36 java
28075 ec2-user  20   0 4355m 652m 9.8m S 138.5  1.1   0:27.80 java
27610 ec2-user  20   0 4503m 1.1g 9.9m S 130.3  1.9   1:03.86 java
27547 ec2-user  20   0 4503m 1.1g 9.9m S 127.4  1.8   1:09.28 java
27490 ec2-user  20   0 4503m 1.1g 9.9m S 100.5  1.8   1:09.56 java
27124 ec2-user  20   0 4503m 1.3g   9m S 75.5  2.2   2:14.63 java
27779 ec2-user  20   0 4431m 1.2g 9.9m S 68.6  2.0   0:55.03 java
27051 ec2-user  20   0 4503m 1.3g   9m S 65.0  2.2   2:20.66 java
27767 ec2-user  20   0 4431m 1.2g 9.9m S 64.3  1.9   0:55.12 java
27922 ec2-user  20   0 4431m 1.1g 9.9m S 61.1  1.9   0:42.67 java
27849 ec2-user  20   0 4431m 1.1g 9.9m S 60.7  1.9   0:49.17 java
27958 ec2-user  20   0 4431m 1.1g 9.9m S 56.8  1.8   0:42.79 java
28016 ec2-user  20   0 4431m 1.0g 9.9m S 55.8  1.7   0:42.86 java
27280 ec2-user  20   0 4503m 1.2g   9m S 53.2  2.0   1:57.32 java
27343 ec2-user  20   0 4503m 1.3g   9m S 52.9  2.1   1:59.01 java
27683 ec2-user  20   0 4503m 1.1g 9.9m S 50.6  1.8   0:59.54 java
6841 ec2-user  20   0 4355m 2.3g 9.8m S 58.1  3.8     22:29.07 java

and 4 which are past the timeout range

 4106 ec2-user  20   0 4355m 371m 9.8m S 56.1  0.6    60:59.26 java
 8143 ec2-user  20   0 4355m 1.5g 9.9m S 59.4  2.5     287:53.77 java
 8035 ec2-user  20   0 4355m 2.2g 9.8m S 57.1  3.7     288:21.62 java
21306 ec2-user  20   0 4355m 435m 9.8m S 51.2  0.7  188:07.36 java


So I would assume that parallel tried to kill these processes and was
not able to and also started some more to compensate

I have tested that the timeout works locally but this is my first time
seeing the remote server processes not timing out

I'm launching the process as follows

ls $sourceDir*.zip |  parallel -j+$NumberExtraJobsPerServer --eta
--progress --sshlogin $servers --timeout $timeout --transfer --joblog
$jobLog "sh /mnt/xslt_volume/i4EnrichV7/src/enrich/10k/scripts/runCalabash.sh
/mnt/xslt_volume/i4EnrichV7/src/enrich/10k/xpl/i4Enrich.xpl
$documentSpecficLogs{/.}Log.txt {}" $outputDir  $svnRepositoryRoot
$svnRevision $logging $debug $saveHTML

with parameters

servers="xx.xx.xxx.xxx"

saveHTML="true"

debug="false"

timeout="1500"

logging="false"

NumberExtraJobsPerServer="1"                                                    
        

sourceDir="/mnt/xslt_volume/i4ContentSource/SEC/10k-GHU/2009/"

outputDir="/mnt/xslt_volume/i4ContentOutput/SEC/10k-GHU/2009/"

logDir="${outputDir}logs/"

documentSpecficLogs="${logDir}documentSpecfic/"

jobLog="${logDir}parallelJobLog.txt"

metricsLog="${logDir}metrics.txt"

svnRepositoryRoot=$(svn info |grep 'Repository Root' | sed
's/Repository Root: //g')

svnRevision=$(svn info |grep Revision | sed 's/Revision: //g')


Given the process it's highly likely that a regular expression given
some permutation of text is hanging in a few of the 10000 input files.
I'll have to debug that.

Is there anything I could do to ensure that parallel is able to kill
processes remotely?

Regards

-- 
-

Alex G. Muir
Software Engineering Consultant
Linkedin Profile : http://ca.linkedin.com/pub/alex-muir/36/ab7/125



reply via email to

[Prev in Thread] Current Thread [Next in Thread]