Re: [PATCH v3 0/9] block-backend: Introduce I/O hang

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/9] block-backend: Introduce I/O hang

From:	Stefan Hajnoczi
Subject:	Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
Date:	Fri, 30 Oct 2020 13:21:53 +0000

On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
> 
> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> >> A VM in the cloud environment may use a virutal disk as the backend 
> >> storage,
> >> and there are usually filesystems on the virtual block device. When backend
> >> storage is temporarily down, any I/O issued to the virtual block device 
> >> will
> >> cause an error. For example, an error occurred in ext4 filesystem would 
> >> make
> >> the filesystem readonly. However a cloud backend storage can be soon 
> >> recovered.
> >> For example, an IP-SAN may be down due to network failure and will be 
> >> online
> >> soon after network is recovered. The error in the filesystem may not be
> >> recovered unless a device reattach or system restart. So an I/O rehandle is
> >> in need to implement a self-healing mechanism.
> >>
> >> This patch series propose a feature called I/O hang. It can rehandle AIOs
> >> with EIO error without sending error back to guest. From guest's 
> >> perspective
> >> of view it is just like an IO is hanging and not returned. Guest can get
> >> back running smoothly when I/O is recovred with this feature enabled.
> > 
> > Hi,
> > This feature seems like an extension of the existing -drive
> > rerror=/werror= parameters:
> > 
> >   werror=action,rerror=action
> >       Specify which action to take on write and read errors. Valid
> >       actions are: "ignore" (ignore the error and try to continue),
> >       "stop" (pause QEMU), "report" (report the error to the guest),
> >       "enospc" (pause QEMU only if the host disk is full; report the
> >       error to the guest otherwise).  The default setting is
> >       werror=enospc and rerror=report.
> > 
> > That mechanism already has a list of requests to retry and live
> > migration integration. Using the werror=/rerror= mechanism would avoid
> > code duplication between these features. You could add a
> > werror/rerror=retry error action for this feature.
> > 
> > Does that sound good?
> > 
> > Stefan
> > 
> 
> Hi Stefan,
> 
> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
> way for the retry feature.
> 
> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
> provides ACTION, and the real handler of errors need be implemented several
> times in device layer for different devices. While our I/O Hang mechanism
> directly handles AIO errors no matter which type of devices it is. Is it a
> more common way to implement the feature in block-backend layer? Especially we
> can set retry timeout in a common structure BlockBackend.
> 
> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
> in device layer rather than in block-backend layer?

Yes, it's because failed requests can be live-migrated and retried on
the destination host. In other words, live migration still works even
when there are failed requests.

There may be things that can be refactored so there is less duplication
in devices, but the basic design goal is that the block layer doesn't
keep track of failed requests because they are live migrated together
with the device state.

Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.

Stefan

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v3 1/9] block-backend: introduce I/O rehandle info, (continued)
- [PATCH v3 1/9] block-backend: introduce I/O rehandle info, Jiahui Cen, 2020/10/22
- [PATCH v3 5/9] block-backend: enable I/O hang when timeout is set, Jiahui Cen, 2020/10/22
- [PATCH v3 4/9] block-backend: add I/O rehandle pause/unpause, Jiahui Cen, 2020/10/22
- [PATCH v3 7/9] qemu-option: add I/O hang timeout option, Jiahui Cen, 2020/10/22
- [PATCH v3 8/9] qapi: add I/O hang and I/O hang timeout qapi event, Jiahui Cen, 2020/10/22
- [PATCH v3 2/9] block-backend: rehandle block aios when EIO, Jiahui Cen, 2020/10/22
- [PATCH v3 6/9] virtio-blk: pause I/O hang when resetting, Jiahui Cen, 2020/10/22
- [PATCH v3 9/9] docs: add a doc about I/O hang, Jiahui Cen, 2020/10/22
- Re: [PATCH v3 0/9] block-backend: Introduce I/O hang, Stefan Hajnoczi, 2020/10/26
  - Re: [PATCH v3 0/9] block-backend: Introduce I/O hang, cenjiahui, 2020/10/29
    - Re: [PATCH v3 0/9] block-backend: Introduce I/O hang, Stefan Hajnoczi <=

Prev by Date: Re: Out-of-Process Device Emulation session at KVM Forum 2020
Next by Date: Re: [PATCH v2 0/2] 9pfs: test suite fixes
Previous by thread: Re: [PATCH v3 0/9] block-backend: Introduce I/O hang
Next by thread: [PATCH 0/2] hw/block/nvme: two fixes for create sq/cq
Index(es):
- Date
- Thread