[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

transfer and NFS homes

From: Thomas Sattler
Subject: transfer and NFS homes
Date: Thu, 15 Mar 2012 22:47:14 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120312 Thunderbird/11.0

OK, here is how I (nearly) killed my cluster:

-- Story ---------------------------------------------------

Trying to see GNU parallel in action, I decided to repack collectl's
logfiles. On my system they grow until about 700-900MB (raw) per day
which becomes about 150MB (gziped).

First I put them into a scratch dir and unpacked them. I know that
it would have been possible to unpack/repack them in only one step,
I just wanted the machine also to have some big (data-)files to be
transfered. :-)

Then I starte GNU parallel to use five 32-core machines to pack
these raw files. As they were in a local scratch directory, I
"had" to transfer them to the compute nodes. And there was my
mistake: I used a relative path to the files.

  (OK, I'd need to say that the five compute nodes all
   have local scratch dirs but also share homes via NFS.)

And there we are: The uncompressed logfiles were transfered to the
compute nodes and placed in the NFS home dir. In other words: The
files were in fact sent back to the head node.

All six machines (headnode and compute nodes) became unusable
quite soon. I guess the nodes cached the data for a while, so
all five machines had huge buffers to feed NFS. :-)

To bring a long story to an end: Killing parallel and rsync did
not help, the headnodes nfsd's were still very busy. I waited
several minutes, the headnodes load was still increasing and
the nodes were unusable, too.

I had to hard reset the nodes to get the headnode back.

-- Question ------------------------------------------------

As I asked before in "issues with --load": Shouldn't we take
more care that (not-so-experienced) users do not overload
their machines by accident?

In this case: Shouldn't GNU parallel detect a situation like
this ("transfer to NFS homes") and exit with an error?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]