qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
Date: Fri, 3 Jun 2011 13:57:51 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
> On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
> >On Thu, Jun 02, 2011 at 03:01:24PM -0300, Luiz Capitulino wrote:
> >>On Thu, 02 Jun 2011 09:02:30 -0500
> >>Anthony Liguori<address@hidden>  wrote:
> >>
> >>>On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> >>>>On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> >>>>>On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> >>>>>>>>B. query-stop-reason
> >>>>>>>>--------------------
> >>>>>>>>
> >>>>>>>>I also have a simple solution for item 2. The vm_stop() accepts a 
> >>>>>>>>reason
> >>>>>>>>argument, so we could store it somewhere and return it as a string, 
> >>>>>>>>like:
> >>>>>>>>
> >>>>>>>>->     { "execute": "query-stop-reason" }
> >>>>>>>><- { "return": { "reason": "user" } }
> >>>>>>>>
> >>>>>>>>Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >>>>>>>>this should be "ioerror", no?), "watchdog", "panic", "savevm", 
> >>>>>>>>"loadvm",
> >>>>>>>>"migrate".
> >>>>>>>>
> >>>>>>>>Also note that we have a STOP event. It should be extended with the
> >>>>>>>>stop reason too, for completeness.
> >>>>>>>
> >>>>>>>
> >>>>>>>Can we just extend query-block?
> >>>>>>
> >>>>>>Primarily we want 'query-stop-reason' to tell us what caused the VM
> >>>>>>CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> >>>>>>be used to find out which particular block device(s) caused the IO
> >>>>>>error to occurr&    get the "reason" that was in the BLOCK_IO_ERROR
> >>>>>>event.
> >>>>>
> >>>>>My concern is that we're over abstracting here.  We're not going to add
> >>>>>additional stop reasons in the future.
> >>>>>
> >>>>>Maybe just add an 'io-error': True to query-state.
> >>>>
> >>>>Sure, adding a new field to query-state response would work as well. And 
> >>>>it
> >>>>seems like a good idea to me since one already needs to call query-status 
> >>>>to
> >>>>check if CPUs are stopped or not so it makes sense to incorporate the
> >>>>additional information there as well. And if you want to be safe for the
> >>>>future, the new field doesn't have to be boolean 'io-error' but it can be 
> >>>>the
> >>>>string 'reason' which Luiz suggested above.
> >>>
> >>>
> >>>String enumerations are a Bad Thing.  It's impossible to figure out what
> >>>strings are valid and it lacks type safety.
> >>>
> >>>Adding more booleans provides better type safety, and when we move to
> >>>QAPI with a queryable schema, provides a way to figure out exactly what
> >>>combinations are supported by QEMU.
> >>
> >>To summarize:
> >>
> >>  1. Add a 'io-error' field to query-status (which is only present if
> >>     field 'running' is false)
> >
> >This isn't really enough. There are many reasons why a VM may have
> >transitioned to the paused state, of which IO Error is merely one.
> >The query-status needs to be able to report what the reason for
> >the transitioning to the paused state is.
> 
> No, there's only two reasons:
> 
> 1) IO Error (and user configured pause on I/O error)
> 
> 2) The result of some user action (an explicit stop, live migration, etc.)
> 
> The fact that all of these things call vm_stop() internal is an
> implementation detail.  Adding a string parameter to vm_stop() of a
> reason may seem like an easy thing to do but you're taking something
> that is an internal concept in QEMU and making it part of an
> interface that needs to be supported forever.
> 
> That's why I'm suggesting modelling a user visible concept (I/O
> errors stop a guest) instead of trying to model an internal QEMU
> concept (vm_stop()).
> 
> If you have other user visible concepts that you want to know about,
> please share the use-cases and we can think about how to model it
> such that it's not exposing internal QEMU details.

None of the requested info is exposing internal QEMU impl details
with one exception. The reasons are either administrative commands,
host OS failures, guest OS failures, or the exception, KVM internal
emulation failure.

The core problem is that an app connects to QEMU, finds it is paused,
and wants to decide what action to take. If the guest is paused due
to a previous admin 'stop' command, it will allow resuming. If it is
paused due to guest OS poweroff, it might decide to issue a 'system_reset'
command and then 'resume'. If it is paused due to watchdog, it might
decide it wants to pmemsave the guest OS, and then system_reset+resume.
If it is paused because KVM hit an emulation failure, it may wish to
attach to the debugger interface and capture VM/QEMU state.

The other problem is that a sysadmin finds a guest unexpectedly paused,
and the mgmt app can't tell it why and they want to troubleshoot the
problem. QEMU should be able to tell the sysadmin why it is in this
state, so they can proceed with trouble shooting in a suitable direction,
whether the host OS, KVM itself, or the guest OS, or the mgt tool.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]