[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] Stop VM on ENOSPC error.

From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH v3] Stop VM on ENOSPC error.
Date: Mon, 19 Jan 2009 20:38:14 +0200
User-agent: Thunderbird (X11/20090105)

Ian Jackson wrote:
Anthony Liguori writes ("Re: [Qemu-devel] [PATCH v3] Stop VM on ENOSPC error."):
Ian Jackson wrote:
Once again, this feature should be optional.

Well, three reasons, one general and theoretical, and two practical
and rather Xen-specific.

This has been tried before, but...

The theoretical reason is that a guest is in a better postion to deal
with the situation because it knows its access patterns.  Often the
response to a failing write in a mission-critical system will be some
kind a fallback behaviour, which is likely to work.

A situation where many writes fail and many writes succeed is unlikely to have been tested and is therefore unlikely to work. Particularly as some time afterwards all writes start to succeed again as if nothing has happened.

A single disk guest will thrash its disk, eventually remounting it read-only (in the case of Linux) and then failing left and right.

A multiple disk guest in a RAID 5 configuration will enter degraded mode, and then corrupt data. RAID 5 wasn't designed for multiple disk failures. By induction RAID 6 fails as well.

Stopping the VM
unconditionally is not something that the guest can cope with.

The guest doesn't need to cope with it; the management system does.

The practical reasons are that we would want to retain existing
behaviour unless it was clearly broken (which we don't think it is),
and that we don't currently have any useful mechanism for reporting
and dealing with the problem.

Fundamentally I think we're seeing this different because of the way
that Xen uses qemu is contextually quite different to the
`traditional' qemu.  Traditionally qemu is used as a subprogram of
other tasks, as an interactive debugging or GUI tool, or whatever.

But in the Xen context, a Xen VM is not a `task' in the same way.
(Xen users make much less use of the built-in cow formats for this
reason, often preferring LVM snapshots or even deeper storage magic.)
We expect the VM to be up and stay up and if it can't continue it
needs to fail or crash

You can resume the guest over the monitor (or xenstore if you insist) once more storage is allocated, same as everyone else. I don't see how qemu's role in Xen makes a difference.

The only alternative I see to stopping the VM is to offline the disk for both reads and writes. This at least protects data, and is similar to controller or cable failure which guests may have been tested with. An advantage is that if an unimportant disk fails, the guest can continue to work.

I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]