[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to call GNU parallel inside a perl script

From: João Garcia
Subject: Re: How to call GNU parallel inside a perl script
Date: Tue, 29 Jan 2013 23:52:45 -0200


How are you doing this exactly ?

If you have a text file with a list of files to download I would suggest you use aria2c, which have this functionality built in:

$ aria2c -i urls.txt -j 100
$ aria2c --help
 -i, --input-file=FILE        Downloads URIs found in FILE. You can specify
                              multiple URIs for a single entity: separate
                              URIs on a single line using the TAB character.
                              Reads input from stdin when '-' is specified.
                              Additionally, options can be specified after each
                              line of URI. This optional line must start with
                              one or more white spaces and have one option per
                              single line. See INPUT FILE section of man page
                              for details. See also --deferred-input option.

                              Possible Values: /path/to/file, -
                              Tags: #basic

 -j, --max-concurrent-downloads=N Set maximum number of parallel downloads for
                              every static (HTTP/FTP) URL, torrent and metalink.
                              See also --split option.

                              Possible Values: 1-*
                              Default: 5
                              Tags: #basic


On Tue, Jan 29, 2013 at 9:38 PM, yacob sen <address@hidden> wrote:

Dear All,

I have a perl script that works initially by fetching hundreds of files from a server. I know that gnu parallel is really suited for getting files from an ftp site using wget command in parallel. I would like to use the advantage of this awesome Gnu prallel tool by calling inside my perl script.

Is that possible, if so an example would be very much appreciated. ? 



--- On Sun, 20/1/13, Nanditha Rao <address@hidden> wrote:

From: Nanditha Rao <address@hidden>
Subject: Re: Multiple jobs on a multicore machine or cluster
To: address@hidden
Date: Sunday, 20 January, 2013, 9:45

Never mind. I figured out that I had to transfer the files using --transfer before running them. And that it gets copied to the home directory of the destination machine by default.

On Sat, Jan 19, 2013 at 12:58 AM, Nanditha Rao <address@hidden> wrote:

On Fri, Jan 18, 2013 at 12:53 AM, Ole Tange <address@hidden> wrote:
On Thu, Jan 17, 2013 at 12:58 PM, Nanditha Rao <address@hidden> wrote:

> 1. I need to run multiple jobs on a multicore (and multithreaded) machine. I
> am using the GNU Parallel utility to distribute jobs across the cores to
> speed up the task. The commands to be executed are available in a file
> called 'commands'. I use the following command to run the GNU Parallel.
> cat commands | parallel -j +0
> As per the guidance at this location- gnu parallel, this command is supposed
> to use all the cores to run this task. My machine has 2 cores and 2 threads
> per core.

I take it that you have a CPU with hyperthreading.
[Nanditha: I guess so. I am using an Intel core i3 laptop to test this tool out..]

> The system monitor however shows 4 CPUs (CPU1 and CPU2 belong to
> core1, CPU3 and CPU4 belong to core2). Each job (simulation) takes about 20
> seconds to run on a single core. I ran 2 jobs in parallel using this GNU
> parallel utility with the command above. I observe in the system monitor

What system monitor are you using?
[Nanditha: gnome-system-monitor on ubuntu] 

> that, if the 2 jobs are assigned to cpu1 and cpu2 (that is the same core),
> there is obviously no speed-up.

Why obviously? Normally I measure a speedup of 30-70% when using hyperthreading.
[Nanditha: I somehow dont see a speedup. Running a single job on single thread on single core versus two threads on the same core is taking the same time- about 20seconds] 

> They take about 40seconds to finish, which
> is about the time they would take if run sequentially. However, sometimes
> the tool distributes the 2 jobs to CPU1 and CPU3 or CPU4 (which means, 2
> jobs are assigned to 2 different cores). In this case, both jobs finish
> parallely in 20 seconds.

GNU Parallel does not do the distributing; it simply spawns jobs. The
distribution is done by your operating system.

> Now, I want to know if there is a way in which I can force the tool to run
> on different "cores" and not on different "threads" on the same core, so
> that there is appreciable speed-up. Any help is appreciated. Thanks!

If you are using GNU/Linux you can use taskset which can set a mask on
which cores a task can be scheduled on. If you want every other:
1010(bin) = 0xA. For a 128 core machine you could run:

cat commands | taskset 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa parallel -j +0
[Nanditha: Tried this, thanks. But seems like it doesnt help speedup the jobs as assumed by me earlier] 

> 2. Also, I want to know if there is a way to run this utility over a cluster
> of machines.. say, there are four 12-core machines in a cluster (making it a
> 48-core cluster).

cat commands | parallel -j +0 -S server1,server2,server3,server4
[Nanditha: I tried this option.  cat commands|parallel -j +0 --sshlogin address@hidden
However, I get an error that the files listed the 'commands' file are not to be found. Basically I am running a simulation and invoking the commands through the file called 'commands'. Is there some path I need to specify as to where they should get copied in the destination server? Or by default where does it get copied to and where do I go to see my results file.  This is the error I get (where each file is part of the command that I specify in 'commands':)
decoder_node_1_line0_sim_4.sp: No such file or directory
decoder_node_1_line0_sim_3.sp: No such file or directory
decoder_node_1_line0_sim_1.sp: No such file or directory
decoder_node_1_line0_sim_2.sp: No such file or directory

My commands file contains:
ngspice decoder_node_1_line0_sim_1.sp 
ngspice decoder_node_1_line0_sim_2.sp 
ngspice decoder_node_1_line0_sim_3.sp 
ngspice decoder_node_1_line0_sim_4.sp 

and the tool parallel is being invoked from the directory in which these files are present. So, I expect that the tool should pick these files up from the current directory and distribute it to the server and run them. It runs locally on my machine, but the -S option gives me the above error. Can you pls suggest?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]