[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex
From: |
Stefan Hajnoczi |
Subject: |
Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex |
Date: |
Thu, 15 Jul 2021 14:29:44 +0100 |
On Thu, Jul 15, 2021 at 03:35:37PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 13.07.2021 19:38, Stefan Hajnoczi wrote:
> > On Tue, Jul 13, 2021 at 06:18:39PM +0300, Vladimir Sementsov-Ogievskiy
> > wrote:
> > > 13.07.2021 16:10, Stefan Hajnoczi wrote:
> > > > On Mon, Jul 12, 2021 at 10:41:46AM +0200, Emanuele Giuseppe Esposito
> > > > wrote:
> > > > >
> > > > >
> > > > > On 08/07/2021 15:04, Stefan Hajnoczi wrote:
> > > > > > On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
> > > > > > > On 08/07/21 12:36, Stefan Hajnoczi wrote:
> > > > > > > > > What is very clear from this patch is that it
> > > > > > > > > is strictly related to the brdv_* and lower level calls,
> > > > > > > > > because
> > > > > > > > > they also internally check or even use the aiocontext lock.
> > > > > > > > > Therefore, in order to make it work, I temporarly added some
> > > > > > > > > aiocontext_acquire/release pair around the function that
> > > > > > > > > still assert for them or assume they are hold and temporarly
> > > > > > > > > unlock (unlock() - lock()).
> > > > > > > >
> > > > > > > > Sounds like the issue is that this patch series assumes
> > > > > > > > AioContext locks
> > > > > > > > are no longer required for calling the blk_*()/bdrv_*() APIs?
> > > > > > > > That is
> > > > > > > > not the case yet, so you had to then add those
> > > > > > > > aio_context_lock() calls
> > > > > > > > back in elsewhere. This approach introduces unnecessary risk. I
> > > > > > > > think we
> > > > > > > > should wait until blk_*()/bdrv_*() no longer requires the
> > > > > > > > caller to hold
> > > > > > > > the AioContext lock before applying this series.
> > > > > > >
> > > > > > > In general I'm in favor of pushing the lock further down into
> > > > > > > smaller and
> > > > > > > smaller critical sections; it's a good approach to make further
> > > > > > > audits
> > > > > > > easier until it's "obvious" that the lock is unnecessary. I
> > > > > > > haven't yet
> > > > > > > reviewed Emanuele's patches to see if this is what he's doing
> > > > > > > where he's
> > > > > > > adding the acquire/release calls, but that's my understanding of
> > > > > > > both his
> > > > > > > cover letter and your reply.
> > > > > >
> > > > > > The problem is the unnecessary risk. We know what the goal is for
> > > > > > blk_*()/bdrv_*() but it's not quite there yet. Does making changes
> > > > > > in
> > > > > > block jobs help solve the final issues with blk_*()/bdrv_*()?
> > > > >
> > > > > Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*()
> > > > > operation mostly take care of building, modifying and walking the bds
> > > > > graph.
> > > > > So since graph nodes can have multiple AioContext, it makes sense
> > > > > that we
> > > > > have a lock when modifying the graph, right?
> > > > >
> > > > > If so, we can simply try to replace the AioContext lock with a graph
> > > > > lock,
> > > > > or something like that. But I am not sure of this.
> > > >
> > > > Block graph manipulation (all_bdrv_states and friends) requires the BQL.
> > > > It has always been this way.
> > > >
> > > > This raises the question: if block graph manipulation is already under
> > > > the BQL and BlockDriver callbacks don't need the AioContext anymore, why
> > >
> > > I don't believe that block drivers are thread-safe now. They have some
> > > mutexes.. But who make an audit of thread-safety?
> >
> > Emanuele :)
> >
> > FWIW I took a look at the stream, mirror, and backup jobs today and
> > couldn't find anything that's unsafe after this series. I was expecting
> > to find issues but I think Emanuele and Paolo have taken care of them.
>
>
> Hmm, do you mean that all jobs are thread-safe?
>
> Looking at the mirror, what protects s->ops_in_flight for example? It's
> accessed from job coroutines and from mirror_top filter write operation.
You're right. I missed the bdrv_mirror_top BlockDriver:
.pwrite_zeroes -> bdrv_mirror_top_pwrite_zeroes -> active_write_prepare ->
QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next)
This is not thread-safe. A CoMutex is needed here to protect
MirrorBSDOpaque fields.
Stefan
signature.asc
Description: PGP signature
- [RFC PATCH 6/6] jobs: remove unnecessary AioContext aquire/release pairs, (continued)
- [RFC PATCH 6/6] jobs: remove unnecessary AioContext aquire/release pairs, Emanuele Giuseppe Esposito, 2021/07/07
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/08
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Paolo Bonzini, 2021/07/08
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Kevin Wolf, 2021/07/08
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/08
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Emanuele Giuseppe Esposito, 2021/07/12
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/13
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Vladimir Sementsov-Ogievskiy, 2021/07/13
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/13
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Vladimir Sementsov-Ogievskiy, 2021/07/15
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex,
Stefan Hajnoczi <=
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Kevin Wolf, 2021/07/16
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/19
- Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Kevin Wolf, 2021/07/19
Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex, Stefan Hajnoczi, 2021/07/08