A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device
will cause an error. For example, an error occurred in ext4 filesystem would
make the filesystem readonly. In production environment, a cloud backend
storage can be soon recovered. For example, an IP-SAN may be down due to
network failure and will be online soon after network is recovered. However,
the error in the filesystem may not be recovered unless a device reattach
or system restart. Thus an I/O retry mechanism is in need to implement a
self-healing system.
This patch series propose to extend the werror=/rerror= mechanism to add
a 'retry' feature. It can automatically retry failed I/O requests on error
without sending error back to guest, and guest can get back running smoothly
when I/O is recovred.
v3->v4:
* Adapt to werror=/rerror= mechanism.
v2->v3:
* Add a doc to describe I/O hang.
v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.
REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html
Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
Signed-off-by: Ying Fang <fangying1@huawei.com>
Jiahui Cen (7):
qapi/block-core: Add retry option for error action
block-backend: Introduce retry timer
block-backend: Add device specific retry callback
block-backend: Enable retry action on errors
block-backend: Add timeout support for retry
block: Add error retry param setting
virtio_blk: Add support for retry on errors
block/block-backend.c | 66 ++++++++++++++++++++
blockdev.c | 52 +++++++++++++++
hw/block/block.c | 10 +++
hw/block/virtio-blk.c | 19 +++++-
include/hw/block/block.h | 7 ++-
include/sysemu/block-backend.h | 10 +++
qapi/block-core.json | 4 +-
7 files changed, 162 insertions(+), 6 deletions(-)