qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 08/20] jobs: protect jobs with job_lock/unlock


From: Emanuele Giuseppe Esposito
Subject: Re: [PATCH v5 08/20] jobs: protect jobs with job_lock/unlock
Date: Thu, 24 Feb 2022 17:55:03 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


On 24/02/2022 17:48, Stefan Hajnoczi wrote:
> On Thu, Feb 24, 2022 at 01:45:48PM +0100, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 17/02/2022 15:48, Stefan Hajnoczi wrote:
>>> On Tue, Feb 08, 2022 at 09:35:01AM -0500, Emanuele Giuseppe Esposito wrote:
>>>> diff --git a/block/replication.c b/block/replication.c
>>>> index 55c8f894aa..a03b28726e 100644
>>>> --- a/block/replication.c
>>>> +++ b/block/replication.c
>>>> @@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
>>>>      if (s->stage == BLOCK_REPLICATION_FAILOVER) {
>>>>          commit_job = &s->commit_job->job;
>>>>          assert(commit_job->aio_context == qemu_get_current_aio_context());
>>>
>>> Is it safe to access commit_job->aio_context outside job_mutex?
>>
>> No, but it is currently not done. Patch 18 takes care of protecting
>> aio_context. Remember again that job lock API is still nop.
>>>
>>>> @@ -1838,7 +1840,9 @@ static void drive_backup_abort(BlkActionState 
>>>> *common)
>>>>          aio_context = bdrv_get_aio_context(state->bs);
>>>>          aio_context_acquire(aio_context);
>>>>  
>>>> -        job_cancel_sync(&state->job->job, true);
>>>> +        WITH_JOB_LOCK_GUARD() {
>>>> +            job_cancel_sync(&state->job->job, true);
>>>> +        }
>>>
>>> Maybe job_cancel_sync() should take the lock internally since all
>>> callers in this patch seem to need the lock?
>>
>> The _locked version is useful because it is used when lock guards are
>> already present, and cover multiple operations. There are only 3 places
>> where a lock guard is added to cover job_cance_sync_locked. Is it worth
>> defining another additional function?
>>
>>
>>>
>>> I noticed this patch does not add WITH_JOB_LOCK_GUARD() to
>>> tests/unit/test-blockjob.c:cancel_common(). Was that an oversight or is
>>> there a reason why job_mutex is not needed around the job_cancel_sync()
>>> call there?
>>
>> No, locks in unit tests are added in patch 10 "jobs: protect jobs with
>> job_lock/unlock".
> 
> I see, it's a question of how to split up the patches. When patches
> leave the code in a state with broken invariants it becomes difficult to
> review. I can't distinguish between actual bugs and temporary violations
> that will be fixed in a later patch (unless they are clearly marked).
> 
> If you can structure patches so they are self-contained and don't leave
> the broken invariants then that would make review easier, but in this
> case it is tricky so I'll do the best I can to review it if you cannot
> restructure the sequence of commits.

Yes, the main problem is that ideally we want to add job lock and remove
Aiocontext lock. But together this can't happen, and just adding proper
locks will create a ton of deadlocks, because in order to maintain
invariants sometimes job lock is inside aiocontext lock, and some other
times the opposite happens.

The way it is done in this serie is:
1) create job_lock/unlock as nop
2) make sure the nop job_lock is protecting the Job fields
3) do all API renaming, accoring with the locking used above
4) enable job_lock (not nop anymore) and at the same time remove the
AioContext

If you want, a more up-to-date branch with your feedbacks applied so far
is here:
https://gitlab.com/eesposit/qemu/-/commits/dp_jobs_reviewed

> 
>>>
>>>> @@ -252,7 +258,13 @@ int block_job_add_bdrv(BlockJob *job, const char 
>>>> *name, BlockDriverState *bs,
>>>>  
>>>>  static void block_job_on_idle(Notifier *n, void *opaque)
>>>>  {
>>>> +    /*
>>>> +     * we can't kick with job_mutex held, but we also want
>>>> +     * to protect the notifier list.
>>>> +     */
>>>> +    job_unlock();
>>>>      aio_wait_kick();
>>>> +    job_lock();
>>>
>>> I don't understand this. aio_wait_kick() looks safe to call with a mutex
>>> held?
>> You are right. It should be safe.
>>
>>>
>>>> @@ -292,7 +304,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, 
>>>> Error **errp)
>>>>      job->speed = speed;
>>>>  
>>>>      if (drv->set_speed) {
>>>> +        job_unlock();
>>>>          drv->set_speed(job, speed);
>>>> +        job_lock();
>>>
>>> What guarantees that job stays alive during drv->set_speed(job)? We
>>> don't hold a ref here. Maybe the assumption is that
>>> block_job_set_speed() only gets called from the main loop thread and
>>> nothing else will modify the jobs list while we're in drv->set_speed()?
>>
>> What guaranteed this before? I am not sure.
> 
> I guess the reason is the one I suggested. It should be documented in
> the comments.
> 
>>
>>>
>>>> @@ -545,10 +566,15 @@ BlockErrorAction block_job_error_action(BlockJob 
>>>> *job, BlockdevOnError on_err,
>>>>                                          action);
>>>>      }
>>>>      if (action == BLOCK_ERROR_ACTION_STOP) {
>>>> -        if (!job->job.user_paused) {
>>>> -            job_pause(&job->job);
>>>> -            /* make the pause user visible, which will be resumed from 
>>>> QMP. */
>>>> -            job->job.user_paused = true;
>>>> +        WITH_JOB_LOCK_GUARD() {
>>>> +            if (!job->job.user_paused) {
>>>> +                job_pause(&job->job);
>>>> +                /*
>>>> +                 * make the pause user visible, which will be
>>>> +                 * resumed from QMP.
>>>> +                 */
>>>> +                job->job.user_paused = true;
>>>> +            }
>>>>          }
>>>>          block_job_iostatus_set_err(job, error);
>>>
>>> Does this need the lock? If not, why is block_job_iostatus_reset()
>>> called with the hold?
>>>
>> block_job_iostatus_set_err does not touch any Job fields. On the other
>> hand block_job_iostatus_reset reads job.user_paused and job.pause_count.
> 
> BlockJob->iostatus requires no locking?
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]