Re: Parallel Digest, Vol 63, Issue 5

On Thu, Jul 30, 2015 at 12:00 PM, <parallel-request@gnu.org> wrote:

Send Parallel mailing list submissions to
parallel@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.gnu.org/mailman/listinfo/parallel
or, via email, send a message with subject or body 'help' to
parallel-request@gnu.org

You can reach the person managing the list at
parallel-owner@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Parallel digest..."

Today's Topics:

1. SQL save mode for GNU Parallel? (Ole Tange)

----------------------------------------------------------------------

Message: 1
Date: Thu, 30 Jul 2015 14:26:34 +0200
From: Ole Tange <tange@gnu.org>
To: "parallel@gnu.org" <parallel@gnu.org>
Cc: Stephen Fralich <sjf4@uw.edu>
Subject: SQL save mode for GNU Parallel?
Message-ID:
<CA+4vN7xMnunACOgrCMLWXNR_hn1OwWi20=OPcr+wMZXy_KeLgg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

I just discovered a fork of GNU Parallel:
https://github.com/stephen-fralich/parallel-sql/

It saves into PostgreSQL.

If GNU Parallel should have an --sql option, it should be more general
than that. It would be obvious to use a DBURL to specify which driver,
username, password, and database to use.

The most obvious would be having a table containing the columns from
--joblog and the arguments. For some uses it would also make sense to
have the stderr+stdout.

So I am thinking of:

--sql mysql://user:pass@host/db/table

If the table does not exist: Create it.

But should there be an option to not store stderr+stdout? And if so:
Should that be default? If saving is forced, then you can always just
>/dev/null the output from the job.

I can definitely see uses of being able to run 1000000 simulations
with 10 different variables and then be able to easily get the output
of the jobs where variable A is odd and > variable B (or similar).

What should happen if the user uses variable names that are the same
as the header of --joblog (e.g. Seq or stdout)?

It would also be handy if you could change the status of a job to
'not-run' (which could be represented with exit status -2), so you
could change this while GNU Parallel was running or add new jobs.

You could then have workers that did took jobs out of a database table:

forever parallel --sql mysql://user:pass@host/db/table

And a master node that submitted jobs to the table:

parallel --dry-run --sql mysql://user:pass@host/db/table the_job ::: the args

--dry-run with --sql should put status to 'not-run'.

But that would also require some sort of handling of timeout: worker-2
has started job-seq-4 3 seconds ago, and should not be considered
timed out, thus no other worker should take that job.

GNU Parallel will not depend on DBD-packages installed, but will only
used these when the user asks for the driver. So in package speak it
should probably 'suggest' the DBD-packages.

Ideas? Suggestions? Observations?

/Ole

------------------------------

_______________________________________________
Parallel mailing list
Parallel@gnu.org
https://lists.gnu.org/mailman/listinfo/parallel

End of Parallel Digest, Vol 63, Issue 5
***************************************

Rick Leir
Developer, Canadiana.ca

From:	Rick Leir
Subject:	Re: Parallel Digest, Vol 63, Issue 5
Date:	Thu, 30 Jul 2015 12:46:47 -0400