Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_app

From:	Max Reitz
Subject:	Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply
Date:	Thu, 2 Apr 2020 14:33:03 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

On 01.04.20 10:15, Stefan Reiter wrote:
> All callers of job_txn_apply hold a single job's lock, but different
> jobs within a transaction can have different contexts, thus we need to
> lock each one individually before applying the callback function.
> 
> Similar to job_completed_txn_abort this also requires releasing the
> caller's context before and reacquiring it after to avoid recursive
> locks which might break AIO_WAIT_WHILE in the callback.
> 
> This also brings to light a different issue: When a callback function in
> job_txn_apply moves it's job to a different AIO context, job_exit will
> try to release the wrong lock (now that we re-acquire the lock
> correctly, previously it would just continue with the old lock, leaving
> the job unlocked for the rest of the codepath back to job_exit). Fix
> this by not caching the job's context in job_exit and add a comment
> about why this is done.
> 
> One test needed adapting, since it calls job_finalize directly, so it
> manually needs to acquire the correct context.
> 
> Signed-off-by: Stefan Reiter <address@hidden>
> ---
>  job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
>  tests/test-blockjob.c |  2 ++
>  2 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/job.c b/job.c
> index 134a07b92e..5fbaaabf78 100644
> --- a/job.c
> +++ b/job.c
> @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job)
>      }
>  }
>  
> -static int job_txn_apply(JobTxn *txn, int fn(Job *))
> +static int job_txn_apply(Job *job, int fn(Job *))
>  {
> -    Job *job, *next;
> +    AioContext *inner_ctx;
> +    Job *other_job, *next;
> +    JobTxn *txn = job->txn;
>      int rc = 0;
>  
> -    QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
> -        rc = fn(job);
> +    /*
> +     * Similar to job_completed_txn_abort, we take each job's lock before
> +     * applying fn, but since we assume that outer_ctx is held by the caller,
> +     * we need to release it here to avoid holding the lock twice - which 
> would
> +     * break AIO_WAIT_WHILE from within fn.
> +     */
> +    aio_context_release(job->aio_context);

Hm, is it safe to drop the lock here and then reacquire it later?  I.e.,
is the job in a consistent state in between?  I don’t know.  Looks like
it.  Maybe someone else knows better.

(I find the job code rather confusing.)

> +
> +    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +        inner_ctx = other_job->aio_context;
> +        aio_context_acquire(inner_ctx);
> +        rc = fn(other_job);
> +        aio_context_release(inner_ctx);
>          if (rc) {
>              break;
>          }
>      }
> +
> +    /*
> +     * Note that job->aio_context might have been changed by calling fn, so 
> we
> +     * can't use a local variable to cache it.
> +     */
> +    aio_context_acquire(job->aio_context);

But all callers of job_txn_apply() (but job_exit(), which you fix in
this patch) cache it.  Won’t they release the wrong context then?  Or is
a context change only possible when job_txn_apply() is called from
job_exit()?

Max

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v4 0/3] Fix some AIO context locking in jobs, Stefan Reiter, 2020/04/01
- [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync, Stefan Reiter, 2020/04/01
  - Re: [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync, Max Reitz, 2020/04/02
    - Re: [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync, Stefan Reiter, 2020/04/02
- [PATCH v4 3/3] backup: don't acquire aio_context in backup_clean, Stefan Reiter, 2020/04/01
  - Re: [PATCH v4 3/3] backup: don't acquire aio_context in backup_clean, Max Reitz, 2020/04/02
- [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply, Stefan Reiter, 2020/04/01
  - Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply, Max Reitz <=
    - Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply, Stefan Reiter, 2020/04/02
- Re: [PATCH v4 0/3] Fix some AIO context locking in jobs, Kevin Wolf, 2020/04/02
  - Re: [PATCH v4 0/3] Fix some AIO context locking in jobs, John Snow, 2020/04/06

Prev by Date: Re: Function-like macro with the same name as a typedef confuses Coccinelle
Next by Date: Re: [PATCH v3 1/1] vl/s390x: fixup ram sizes for compat machines
Previous by thread: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply
Next by thread: Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply
Index(es):
- Date
- Thread