[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sqlmaster "nowait" and "append" functionality?
From: |
Ole Tange |
Subject: |
Re: sqlmaster "nowait" and "append" functionality? |
Date: |
Tue, 6 Dec 2016 23:53:57 +0100 |
On Tue, Dec 6, 2016 at 6:10 PM, Andy Loftus <aloftus@gmail.com> wrote:
> Currently, sqlmaster appears to populate a table, first dropping any
> existing table, and waits for jobs to complete. Wondering if there is a way
> to:
>
> 1. populate the database then exit immediately without waiting?
> My particular use case is parallelizing backup tasks that are expected to
> run a long time (several hours on average).
Hmmm... maybe we should change, so you need to add '--wait' if you
want '--sqlmaster' to wait. Seems like a reasonable change.
> On a related note, why does the sqlworker command require the exact same
> input as the sqlmaster? Shouldn't it be sufficient that all necessary
> information is stored in the database and then the sqlworker can just pull
> tasks from the database?
--sqlmaster only inserts the values - not the command. The problem
comes with replacement strings like {%}. You will never know in
advance which job slot a job will be run as:
parallel echo {%} ::: {a..j}
So --sqlmaster cannot store the actual command to run.
It _could_ be changed so that --sqlmaster stores the template command
into the command column, and --sqlworker fetches it form here, and
replaces the command column with the actual command run when done.
It would, however, be a fairly big change of GNU Parallel: The
assumption has always been that the template command remains the same
for the whole run, and quite a bit of optimization depends on this.
On a similar note: What would you expect the table should look like
when you run:
# These do not work - but what would you expect them to do to the table?
parallel --sqlandworker $DBURL -X echo {%}: {} ::: {1..10}
parallel --sqlandworker $DBURL -N3 echo {%}: {} ::: {1..10}
parallel --sqlandworker $DBURL echo {%} '{= $_=total_jobs() =}' ::: {1..10}
> 2. append new tasks to an existing database?
> I think this is more likely a feature request since the man page
> specifically says table will be clobbered. As I understand, the reason is
> that the table schema must/should match, especially the V* columns. But,
> really, isn't that ultimately a burden for the user (as opposed to the
> developer)?
I struggled with this decision, too. My reasoning was, that if you run:
parallel --sqlandmaster $DBURL echo ::: {1..3}
but really meant:
parallel --sqlandmaster $DBURL echo ::: {1..3} ::: {4..6}
then you would have to find a way to clean the database first.
> Perhaps a specific flag allowing "append" operation so user can
> be duly warned and could still check that the number of V* columns matches.
Like having the DBURL start with '+': +pg://tange:mypass/tange/TBL8007
I believe this part is relatively simple to do - especially if we
allow it to die horribly if the columns do not match.
It should:
* Not drop table
* Find the max seq-number, and continue from there
> A sample use case is executing a bunch of generated bash scripts (per above,
> this is how the parallel backups are handled), so the V1 column is the
> absolute path to the script and the command is simply "bash".
I can definitely see it being useful, and I believe the easy changes
would accommodate this:
# Append backup-script* to the queue in $DBURL. Exit immediately (no --wait).
parallel --sqlmaster +$DBURL bash ::: backup-script*
# or even:
chmod 755 backup-script*
parallel --sqlmaster +$DBURL ::: backup-script*
# Do the work by grapping arguments from $DBURL
parallel --sqlworker $DBURL bash
# or even:
PATH=.:$PATH
parallel --sqlworker $DBURL
/Ole