parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EC2 integration for ssh hosts?


From: Matt Oates (Home)
Subject: Re: EC2 integration for ssh hosts?
Date: Tue, 8 Mar 2011 10:53:59 +0000

On 7 March 2011 23:27, Ole Tange <tange@gnu.org> wrote:
> On Sat, Mar 5, 2011 at 12:28 AM, Ole Tange <tange@gnu.org> wrote:
>> On Thu, Mar 3, 2011 at 10:52 AM, Matt Oates (Home) <mattoates@gmail.com> 
>> wrote:
>>>
>>> However, we're reaching the limits of most hardware laying around, my
>>> research group are moving to the Amazon Elastic Cloud. I was wondering
>>> if anyone can gauge how much heartache there would be in hacking on
>>> GNU parallel using the Net::Amazon::EC2 module to bring up a bunch of
>>> cloud instances, and then pass this off as the list of ssh remote
>>> hosts? I could just wrap something around parallel in the shell but it
>>> feels like a nice optional feature for others..?
>>
>> While I am a bioinformatician, too, and while we too are looking at
>> EC2, I do not see a Net::Amazon::EC2 interface becoming a part GNU
>> Parallel: One of the primary goals is to keep the additional
>> requirements to install GNU Parallel very low. Requiring
>> Net::Amazon::EC2 installed to run will go very hard against that goal.
>>
>> I am not sure what the best way to interface GNU Parallel with
>> Net::Amazon::EC2, but my suggestion would be for you to wrap something
>> around GNU Parallel and post your wrapping here. If you write your
>> wrapper in Perl, it might be easier to see if we can find a good way
>> to make a general solution.
>
> I was just at a conference where EC2 was the topic for a couple of the
> talks. I believe what you would do is:
>
> * Put '-S /home/user/.parallel/ec2-hosts' in ~/.parallel/ec2-profile
> * Order N machines using some command line tool.
> * Put the names of the machines into ~/.parallel/ec2-hosts
> * Run: parallel -j ec2-profile your command

We already do that, but thanks for the tip; it's certainly the method
for doing it by hand or without any modification to parallel. Your
ec2-hosts file would need to change with each job (several might be
started by the same user at the same time) as host names are not at
all static and you want to bring up and pull down hosts to save money.
Instead of polluting with files it would be nicer to just use the EC2
libraries to talk directly to the Amazon web-services for a list of
hosts that have just been created. Plus we want to create the N
instances based on the input to parallel to meet demand. At the moment
we're just scripting this around the outside of parallel since it
looks like you really want to keep things really cross compatible in
there, whereas we just want the easily done sloppy solution :)

Something like:    parallel  -S
<(create-and-echo-list-of-hosts-here.sh)    might be a good stopgap.

Best Wishes,
Matt



reply via email to

[Prev in Thread] Current Thread [Next in Thread]