[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] virtio-scsi and error handling
From: |
Hannes Reinecke |
Subject: |
[Qemu-devel] virtio-scsi and error handling |
Date: |
Tue, 11 Jun 2013 13:41:38 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 |
Hi Stefan,
I currently playing around with improving SCSI EH, optimizing
command aborts and the like.
And, supposing it to be a nice testbed, tried to make things work
with virtio_scsi.
However, looking at the code there I've found virtscsi_tmf() just
uses 'wait_for_completion', with no timeout specified. So in effect
any abort might stall forever.
Wouldn't it be more sensible to use 'wait_for_completion_timeout'
here, to allow the error escalation to continue?
This would especially be useful when running with multipathing,
as the underlying device might stall, and aio_cancel() doesn't work
reliably, if at all.
Also I've found that there is no host reset. Currently the virtio
semantics seem to require reliable communication, ie for every
command send there _has_ to be a response.
Long and painful experience with RAID HBAs has shown that this model
works okay for the lower-level escalations, but you absolutely need
a host reset to restore communication.
In the case of virtio I would think that a virtio-level reset for
host_reset would be a sensible idea.
Any opinions from your side?
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
address@hidden +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
- [Qemu-devel] virtio-scsi and error handling,
Hannes Reinecke <=