qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] 9pfs: Fix potential deadlock of QEMU mainloop


From: Greg Kurz
Subject: Re: [PATCH] 9pfs: Fix potential deadlock of QEMU mainloop
Date: Thu, 7 May 2020 16:33:28 +0200

On Thu, 07 May 2020 13:37:30 +0200
Christian Schoenebeck <address@hidden> wrote:

> On Mittwoch, 6. Mai 2020 19:49:10 CEST Greg Kurz wrote:
> > > Ok, but why not both? Moving locks to worker thread and QemuMutex ->
> > > CoMutex?
> > Using CoMutex would be mandatory if we leave the locking where it sits
> > today, so that the main thread can switch to other coroutines instead
> > of blocking. We don't have the same requirement with the worker thread:
> > it just needs to do the actual readdir() and then it goes back to the
> > thread pool, waiting to be summoned again for some other work. 
> 
> Yes, I know.
> 
> > So I'd
> > rather use standard mutexes to keep things simple... why would you
> > want to use a CoMutex here ?
> 
> Like you said, it would not be mandatory, nor a big deal, the idea was just 
> if 
> a lock takes longer than expected then a worker thread could already continue 
> with another task. I mean the amount of worker threads are limited they are 
> not growing on demand, are they?
> 

Yes, the pool is limited to a fixed number of 64 threads, but...

> I also haven't reviewed QEMU's lock implementations in very detail, but IIRC 
> CoMutexes are completely handled in user space, while QemuMutex uses regular 
> OS mutexes and hence might cost context switches. 
> 

... since the locking would only been exercised with an hypothetical
client doing stupid things, this is beginning to look like bike-shedding
to me. :)

> > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > index 9e046f7acb51..ac84ae804496 100644
> > > > --- a/hw/9pfs/9p.c
> > > > +++ b/hw/9pfs/9p.c
> > > > @@ -2170,7 +2170,7 @@ static int coroutine_fn
> > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, int32_t count = 0;
> > > > 
> > > >      struct stat stbuf;
> > > >      off_t saved_dir_pos;
> > > > 
> > > > -    struct dirent *dent;
> > > > +    struct dirent dent;
> > > > 
> > > >      /* save the directory position */
> > > >      saved_dir_pos = v9fs_co_telldir(pdu, fidp);
> > > > 
> > > > @@ -2181,13 +2181,11 @@ static int coroutine_fn
> > > > v9fs_do_readdir_with_stat(V9fsPDU *pdu, while (1) {
> > > > 
> > > >          v9fs_path_init(&path);
> > > > 
> > > > -        v9fs_readdir_lock(&fidp->fs.dir);
> > > > -
> > > 
> > > That's the deadlock fix, but ...
> > > 
> > > >          err = v9fs_co_readdir(pdu, fidp, &dent);
> > > > 
> > > > -        if (err || !dent) {
> > > > +        if (err <= 0) {
> > > > 
> > > >              break;
> > > >          
> > > >          }
> > > 
> > > ... even though this code simplification might make sense, I don't think
> > > it
> > > should be mixed with the deadlock fix together in one patch. They are not
> > 
> > I could possibly split this in two patches, one for returning a copy
> > and one for moving the locking around, but...
> > 
> > > related with each other, nor is the code simplification you are aiming
> > > trivial
> > ... this assertion is somewhat wrong: moving the locking to
> > v9fs_co_readdir() really requires it returns a copy.
> 
> Yeah, I am also not sure whether a split would make it more trivial enough in 
> this case to be worth the hassle. If you find an acceptable solution, good, 
> if 
> not then leave it one patch.
> 

Another option would be to g_malloc() the dirent in v9fs_co_readdir() and
g_free() in the callers. This would cause less churn since we could keep
the same function signature.

> > > enough to justify squashing. The deadlock fix should make it through the
> > > stable branches, while the code simplification should not. So that's
> > > better
> > > off as a separate cleanup patch.
> > 
> > The issue has been there for such a long time without causing any
> > trouble. Not worth adding churn in stable for a bug that is impossible
> > to hit with a regular linux guest.
> 
> Who knows. There are also other clients out there. A potential deadlock is 
> still a serious issue after all.
> 

Well, I guess Cc: qemu-stable doesn't cost much and then I let other
people decide. I have enough in my plate with upstream.

> > > > @@ -32,13 +32,20 @@ int coroutine_fn v9fs_co_readdir(V9fsPDU *pdu,
> > > > V9fsFidState *fidp, struct dirent *entry;
> > > > 
> > > >              errno = 0;
> > > > 
> > > > +
> > > > +            v9fs_readdir_lock(&fidp->fs.dir);
> > > > +
> > > > 
> > > >              entry = s->ops->readdir(&s->ctx, &fidp->fs);
> > > >              if (!entry && errno) {
> > > >              
> > > >                  err = -errno;
> > > > 
> > > > +            } else if (entry) {
> > > > +                memcpy(dent, entry, sizeof(*dent));
> > > > +                err = 1;
> > > 
> > > I find using sizeof(*dent) a bit dangerous considering potential type
> > > changes in future. I would rather use sizeof(struct dirent). It is also
> > > more human friendly to read IMO.
> > 
> > Hmm... I believe it's the opposite actually: with sizeof(*dent), memcpy
> > will always copy the number of bytes that are expected to fit in *dent,
> > no matter the type.
> 
> Yes, but what you intend is to flat copy a structure, not pointers. So no 
> matter how the type is going to be changed you always actually wanted 
> (semantically)
> 
>       copy(sizeof(struct dirent), nelements)
> 
> Now it is nelements=1, in future it might also be nelements>1, but what you 
> certainly don't ever want here is
> 
>       copy(sizeof(void*), nelements)
> 
> > But yes, since memcpy() doesn't do any type checking for us, I think
> > I'll just turn this into:
> > 
> >                 *dent = *entry;
> 
> Ok
> 
> Best regards,
> Christian Schoenebeck
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]