bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ext] Re: Cannot read $HOME/.parallel/tmp/sshlogin/hpc-cpu-144/linel


From: Ole Tange
Subject: Re: [ext] Re: Cannot read $HOME/.parallel/tmp/sshlogin/hpc-cpu-144/linelen
Date: Sun, 15 Jan 2023 14:31:34 +0100

Ahh... Yes, that is probably an exception GNU Parallel currently does
not handle.

Thanks.

/Ole


/Ole

On Sun, Jan 15, 2023 at 10:35 AM Holtgrewe, Manuel
<manuel.holtgrewe@bih-charite.de> wrote:
>
> Dear Ole,
>
>
> Thank you for the explanation.
>
>
> I have double-checked and it turns out the file was empty for some reason. 
> After removing it, it was created correctly on the next parallel call.
>
>
> Also, running it once before or copy-pasting good files for new hosts would 
> resolve the issue as well.
>
>
> Best wishes,
>
> Manuel
>
>
> --
> Dr. Manuel Holtgrewe, Dipl.-Inform.
> Bioinformatician
> Core Unit Bioinformatics – CUBI
> Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in 
> the Helmholtz Association / Charité – Universitätsmedizin Berlin
>
> Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
> Postal Address: Chariteplatz 1, 10117 Berlin
>
> E-Mail: manuel.holtgrewe@bihealth.de
> Phone: +49 30 450 543 607
> Fax: +49 30 450 7 543 901
> Web: cubi.bihealth.org  www.bihealth.org  www.mdc-berlin.de  www.charite.de
> ________________________________
> From: Ole Tange <ole@tange.dk>
> Sent: Saturday, January 14, 2023 6:27:17 PM
> To: Holtgrewe, Manuel
> Subject: [ext] Re: Cannot read 
> $HOME/.parallel/tmp/sshlogin/hpc-cpu-144/linelen
>
> On Wed, Jan 11, 2023 at 3:20 PM Holtgrewe, Manuel
> <manuel.holtgrewe@bih-charite.de> wrote:
> >
> > I'm running GNU parallel on an HPC system inside jobs scheduled by SLURM. 
> > Multiple GNU parallel calls may occur at the same time on the same node and 
> > I get the error message from below.
> >
> > My interpretation is that multiple parallel jobs try to access 
> > `/data/gpfs-1/users/holtgrem_c/.parallel/tmp/sshlogin/hpc-cpu-144/linelen` 
> > which causes issues.
> >
> > Is there a way around this?
>
> GNU Parallel computes the maximal allowed command line length and
> caches it in .parallel/tmp/sshlogin/hpc-cpu-144/linelen, so later runs
> can simply read that file.
>
> We are talking about write-once-read-many.
>
> Given the path contains 'gpfs-1' my guess is that they use GPFS as
> their filesystem, and my experience with GPFS is quite limited. But
> that might be the cause of this. If GPFS does not allow a file to be
> read by multiple processes in parallel, that would explain the
> situation.
>
> If the file is not there, maybe GPFS does not allow multiple processes
> to write to the file simultaneously.
>
> I have tested on NFS and here you will at most get an error from `rm`
> when removing the file: You will get no error from GNU Parallel - it
> will instead simply do-the-right-thing.
>
> To really be sure what is going on here you have to follow
> https://www.gnu.org/software/parallel/man.html#bug-dependent-on-environment
>
> As a workaround there are several things you can try:
>
> * Run GNU Parallel a single time before starting many in parallel.
> This will create the linelen file and hopefully GPFS allows others to
> simply read it in parallel.
>
> * Set $PARALLEL_HOME to a dir not on GPFS. GNU Parallel normally puts
> its stuff in ~/.parallel, but you can force it to put it in another
> dir by setting $PARALLEL_HOME
>
> /Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]