[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] 9pfs: Fix potential deadlock of QEMU mainloop
From: |
Greg Kurz |
Subject: |
Re: [PATCH] 9pfs: Fix potential deadlock of QEMU mainloop |
Date: |
Thu, 7 May 2020 16:33:28 +0200 |
On Thu, 07 May 2020 13:37:30 +0200
Christian Schoenebeck <address@hidden> wrote:
> On Mittwoch, 6. Mai 2020 19:49:10 CEST Greg Kurz wrote:
> > > Ok, but why not both? Moving locks to worker thread and QemuMutex ->
> > > CoMutex?
> > Using CoMutex would be mandatory if we leave the locking where it sits
> > today, so that the main thread can switch to other coroutines instead
> > of blocking. We don't have the same requirement with the worker thread:
> > it just needs to do the actual readdir() and then it goes back to the
> > thread pool, waiting to be summoned again for some other work.
>
> Yes, I know.
>
> > So I'd
> > rather use standard mutexes to keep things simple... why would you
> > want to use a CoMutex here ?
>
> Like you said, it would not be mandatory, nor a big deal, the idea was just
> if
> a lock takes longer than expected then a worker thread could already continue
> with another task. I mean the amount of worker threads are limited they are
> not growing on demand, are they?
>
Yes, the pool is limited to a fixed number of 64 threads, but...
> I also haven't reviewed QEMU's lock implementations in very detail, but IIRC
> CoMutexes are completely handled in user space, while QemuMutex uses regular
> OS mutexes and hence might cost context switches.
>
... since the locking would only been exercised with an hypothetical
client doing stupid things, this is beginning to look like bike-shedding
to me. :)
> > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > index 9e046f7acb51..ac84ae804496 100644
> > > > --- a/hw/9pfs/9p.c
> > > > +++ b/hw/9pfs/9p.c
> > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn
> > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0;
> > > >
> > > > struct stat stbuf;
> > > > off_t saved_dir_pos;
> > > >
> > > > - struct dirent *dent;
> > > > + struct dirent dent;
> > > >
> > > > /* save the directory position */
> > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp);
> > > >
> > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn
> > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) {
> > > >
> > > > v9fs_path_init(&path);
> > > >
> > > > - v9fs_readdir_lock(&fidp->fs.dir);
> > > > -
> > >
> > > That's the deadlock fix, but ...
> > >
> > > > err = v9fs_co_readdir(pdu, fidp, &dent);
> > > >
> > > > - if (err || !dent) {
> > > > + if (err <= 0) {
> > > >
> > > > break;
> > > >
> > > > }
> > >
> > > ... even though this code simplification might make sense, I don't think
> > > it
> > > should be mixed with the deadlock fix together in one patch. They are not
> >
> > I could possibly split this in two patches, one for returning a copy
> > and one for moving the locking around, but...
> >
> > > related with each other, nor is the code simplification you are aiming
> > > trivial
> > ... this assertion is somewhat wrong: moving the locking to
> > v9fs_co_readdir() really requires it returns a copy.
>
> Yeah, I am also not sure whether a split would make it more trivial enough in
> this case to be worth the hassle. If you find an acceptable solution, good,
> if
> not then leave it one patch.
>
Another option would be to g_malloc() the dirent in v9fs_co_readdir() and
g_free() in the callers. This would cause less churn since we could keep
the same function signature.
> > > enough to justify squashing. The deadlock fix should make it through the
> > > stable branches, while the code simplification should not. So that's
> > > better
> > > off as a separate cleanup patch.
> >
> > The issue has been there for such a long time without causing any
> > trouble. Not worth adding churn in stable for a bug that is impossible
> > to hit with a regular linux guest.
>
> Who knows. There are also other clients out there. A potential deadlock is
> still a serious issue after all.
>
Well, I guess Cc: qemu-stable doesn't cost much and then I let other
people decide. I have enough in my plate with upstream.
> > > > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu,
> > > > V9fsFidState *fidp, struct dirent *entry;
> > > >
> > > > errno = 0;
> > > >
> > > > +
> > > > + v9fs_readdir_lock(&fidp->fs.dir);
> > > > +
> > > >
> > > > entry = s->ops->readdir(&s->ctx, &fidp->fs);
> > > > if (!entry && errno) {
> > > >
> > > > err = -errno;
> > > >
> > > > + } else if (entry) {
> > > > + memcpy(dent, entry, sizeof(*dent));
> > > > + err = 1;
> > >
> > > I find using sizeof(*dent) a bit dangerous considering potential type
> > > changes in future. I would rather use sizeof(struct dirent). It is also
> > > more human friendly to read IMO.
> >
> > Hmm... I believe it's the opposite actually: with sizeof(*dent), memcpy
> > will always copy the number of bytes that are expected to fit in *dent,
> > no matter the type.
>
> Yes, but what you intend is to flat copy a structure, not pointers. So no
> matter how the type is going to be changed you always actually wanted
> (semantically)
>
> copy(sizeof(struct dirent), nelements)
>
> Now it is nelements=1, in future it might also be nelements>1, but what you
> certainly don't ever want here is
>
> copy(sizeof(void*), nelements)
>
> > But yes, since memcpy() doesn't do any type checking for us, I think
> > I'll just turn this into:
> >
> > *dent = *entry;
>
> Ok
>
> Best regards,
> Christian Schoenebeck
>
>