[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_app
From: |
Max Reitz |
Subject: |
Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply |
Date: |
Thu, 2 Apr 2020 14:33:03 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
On 01.04.20 10:15, Stefan Reiter wrote:
> All callers of job_txn_apply hold a single job's lock, but different
> jobs within a transaction can have different contexts, thus we need to
> lock each one individually before applying the callback function.
>
> Similar to job_completed_txn_abort this also requires releasing the
> caller's context before and reacquiring it after to avoid recursive
> locks which might break AIO_WAIT_WHILE in the callback.
>
> This also brings to light a different issue: When a callback function in
> job_txn_apply moves it's job to a different AIO context, job_exit will
> try to release the wrong lock (now that we re-acquire the lock
> correctly, previously it would just continue with the old lock, leaving
> the job unlocked for the rest of the codepath back to job_exit). Fix
> this by not caching the job's context in job_exit and add a comment
> about why this is done.
>
> One test needed adapting, since it calls job_finalize directly, so it
> manually needs to acquire the correct context.
>
> Signed-off-by: Stefan Reiter <address@hidden>
> ---
> job.c | 48 ++++++++++++++++++++++++++++++++++---------
> tests/test-blockjob.c | 2 ++
> 2 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/job.c b/job.c
> index 134a07b92e..5fbaaabf78 100644
> --- a/job.c
> +++ b/job.c
> @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job)
> }
> }
>
> -static int job_txn_apply(JobTxn *txn, int fn(Job *))
> +static int job_txn_apply(Job *job, int fn(Job *))
> {
> - Job *job, *next;
> + AioContext *inner_ctx;
> + Job *other_job, *next;
> + JobTxn *txn = job->txn;
> int rc = 0;
>
> - QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
> - rc = fn(job);
> + /*
> + * Similar to job_completed_txn_abort, we take each job's lock before
> + * applying fn, but since we assume that outer_ctx is held by the caller,
> + * we need to release it here to avoid holding the lock twice - which
> would
> + * break AIO_WAIT_WHILE from within fn.
> + */
> + aio_context_release(job->aio_context);
Hm, is it safe to drop the lock here and then reacquire it later? I.e.,
is the job in a consistent state in between? I don’t know. Looks like
it. Maybe someone else knows better.
(I find the job code rather confusing.)
> +
> + QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> + inner_ctx = other_job->aio_context;
> + aio_context_acquire(inner_ctx);
> + rc = fn(other_job);
> + aio_context_release(inner_ctx);
> if (rc) {
> break;
> }
> }
> +
> + /*
> + * Note that job->aio_context might have been changed by calling fn, so
> we
> + * can't use a local variable to cache it.
> + */
> + aio_context_acquire(job->aio_context);
But all callers of job_txn_apply() (but job_exit(), which you fix in
this patch) cache it. Won’t they release the wrong context then? Or is
a context change only possible when job_txn_apply() is called from
job_exit()?
Max
signature.asc
Description: OpenPGP digital signature