[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
transfer and NFS homes
From: |
Thomas Sattler |
Subject: |
transfer and NFS homes |
Date: |
Thu, 15 Mar 2012 22:47:14 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120312 Thunderbird/11.0 |
OK, here is how I (nearly) killed my cluster:
-- Story ---------------------------------------------------
Trying to see GNU parallel in action, I decided to repack collectl's
logfiles. On my system they grow until about 700-900MB (raw) per day
which becomes about 150MB (gziped).
First I put them into a scratch dir and unpacked them. I know that
it would have been possible to unpack/repack them in only one step,
I just wanted the machine also to have some big (data-)files to be
transfered. :-)
Then I starte GNU parallel to use five 32-core machines to pack
these raw files. As they were in a local scratch directory, I
"had" to transfer them to the compute nodes. And there was my
mistake: I used a relative path to the files.
(OK, I'd need to say that the five compute nodes all
have local scratch dirs but also share homes via NFS.)
And there we are: The uncompressed logfiles were transfered to the
compute nodes and placed in the NFS home dir. In other words: The
files were in fact sent back to the head node.
All six machines (headnode and compute nodes) became unusable
quite soon. I guess the nodes cached the data for a while, so
all five machines had huge buffers to feed NFS. :-)
To bring a long story to an end: Killing parallel and rsync did
not help, the headnodes nfsd's were still very busy. I waited
several minutes, the headnodes load was still increasing and
the nodes were unusable, too.
I had to hard reset the nodes to get the headnode back.
-- Question ------------------------------------------------
As I asked before in "issues with --load": Shouldn't we take
more care that (not-so-experienced) users do not overload
their machines by accident?
In this case: Shouldn't GNU parallel detect a situation like
this ("transfer to NFS homes") and exit with an error?
Thomas
- transfer and NFS homes,
Thomas Sattler <=