qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix


From: Stefan Hajnoczi
Subject: Re: [PATCH 10/13] virtiofsd: Custom threadpool for remote blocking posix locks requests
Date: Wed, 6 Oct 2021 11:26:28 +0100

On Tue, Oct 05, 2021 at 04:09:35PM -0400, Vivek Goyal wrote:
> On Mon, Oct 04, 2021 at 03:54:31PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 30, 2021 at 11:30:34AM -0400, Vivek Goyal wrote:
> > > Add a new custom threadpool using posix threads that specifically
> > > service locking requests.
> > > 
> > > In the case of a fcntl(SETLKW) request, if the guest is waiting
> > > for a lock or locks and issues a hard-reboot through SYSRQ then virtiofsd
> > > unblocks the blocked threads by sending a signal to them and waking
> > > them up.
> > > 
> > > The current threadpool (GThreadPool) is not adequate to service the
> > > locking requests that result in a thread blocking. That is because
> > > GLib does not provide an API to cancel the request while it is
> > > serviced by a thread. In addition, a user might be running virtiofsd
> > > without a threadpool (--thread-pool-size=0), thus a locking request
> > > that blocks, will block the main virtqueue thread that services requests
> > > from servicing any other requests.
> > > 
> > > The only exception occurs when the lock is of type F_UNLCK. In this case
> > > the request is serviced by the main virtqueue thread or a GThreadPool
> > > thread to avoid a deadlock, when all the threads in the custom threadpool
> > > are blocked.
> > > 
> > > Then virtiofsd proceeds to cleanup the state of the threads, release
> > > them back to the system and re-initialize.
> > 
> > Is there another way to cancel SETLKW without resorting to a new thread
> > pool? Since this only matters when shutting down or restarting, can we
> > close all plock->fd file descriptors to kick the GThreadPool workers out
> > of fnctl()?
> 
> Ok, I tested this. If a thread is blocked on OFD lock and another
> thread closes associated "fd", it does not unblock the thread
> which is blocked on lock. So closing OFD can't be used for unblocking
> a thread.
> 
> Even if it could be, it can't be a replacement for a thread pool
> in general as we can't block main thread otherwise it can deadlock.
> But we could have used another glib thread pool (instead of a
> custom thread pool which can handle signals to unblock threads).
> 
> If you are curious, here is my test program.
> 
> https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/ofd-lock.c
> 
> Comments in there explain how to use it. It can block on an OFD
> lock and one can send SIGUSR1 which will close fd.

Thanks for investigating this! Too bad that the semantics of SETLKW are
not usable:

I ran two instances on my system so that the second instance blocks in
SETLKW and found the same thing. fcntl(fd, F_OFD_SETLKW, &flock) return
success even though the other thread already closed the fd while the
main thread was blocked in fcntl().

Here is where it gets weird: lslocks(1) shows the OFD locks that are
acquired (process 1) and waiting (process 2). When process 1 terminates,
process 2 makes progress but lslocks(1) shows there are no OFD locks.

This suggests that when fcntl(2) returns success in process 2, the OFD
lock is immediately released by the kernel since the fd was already
closed beforehand. Process 2 would have no way of releasing the lock
since it already closed its fd. So the 0 return value does not really
mean success - there is no acquired OFD lock when fcntl(2) returns!

The problem is that doesn't return early with -EBADFD or similar when
fcntl(2) is blocked, so we cannot use close(fd) to interrupt it :(.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]