qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Guest-sync-delimited and sentinel issue


From: Michael Roth
Subject: Re: [Qemu-devel] Guest-sync-delimited and sentinel issue
Date: Fri, 16 Mar 2012 12:26:44 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Mar 16, 2012 at 05:04:22PM +0100, Michal Privoznik wrote:
> On 16.03.2012 15:49, Michael Roth wrote:
> > On Fri, Mar 16, 2012 at 01:47:42PM +0100, Michal Privoznik wrote:
> >> Hi guys,
> >>
> >> I was just implementing support for guest-sync-delimited into libvirt. My 
> >> intent is to issue this command prior any other command to determine if GA 
> >> is available or not. The big advantage is - it doesn't change the state of 
> >> the guest so from libvirt POV it's harmless. The other big advantage is 
> >> this sentinel byte 0xFF which is supposed to flush all unprocessed (and 
> >> possibly stale) data from previous unsuccessful tries.
> > 
> > Are you opening the qemu-ga socket prior to each command? Or just
> > once at startup? If you're only opening it once, it should be sufficient
> > to do the guest-sync/guest-sync-delimited exchange just once at that time,
> > since the streams are presumably synced at that point, and after that only 
> > if
> > you get a client-side timeout.
> 
> Only once at the guest startup or libvirt daemon startup if the guest is
> already running.
> > 
> > Issuing it prior to each command doesn't guarantee that the agent will
> > be available to handle the command, so you still need be prepared to
> > handle a timeout + re-sync. It does reduce the chances of timing out on
> > doing something that affects guest state though... but if that's the
> > intention here I would recommend just using 'guest-ping', which should
> > work reliably so long as you always re-sync on connect and after
> > client-side timeouts.
> 
> Since GA may come and go I don't see any guaranteed algorithm that would
> work for 100%. And I try to avoid using timeouts for state changing
> commands, because if I choose a timeout which can look reasonable now,
> somebody will hit it sooner or later; take fs-freeze as example.

True, there's still a race between the "ping" response and the guest
agent becoming unresponsive (host issue or guest agent shutdown) during
the subsequent command, but it does reduce the chances of that happening
quite a bit.

I think at least some kind of fail-safe timeout is necessary though. If I
was an evil user I could replace qemu-ga with something that responded to
sync/ping commands but ignored anything else. If we're not careful that
could block libvirt indefinitely.

> Moreover, if we timeout on a state changing command, libvirt would have
> to keep track of such command anyway because if/when it returns, libvirt
> must issue counter command right after: fs-freeze vs. fs-thaw. However,
> only iff command returned successfully.

Yah, it's a pain. The good thing is that it's one or the other, and if
we can issue the fs-thaw, we can issue the fs-status as well to figure
out what needs to be done. We just need to take care with qemu-ga to
allow for information to be reliably re-attained in the event of a
timeout.

> Things get complicated if two or more commands timeout, e.g. due to GA
> not being running.
> And since GA doesn't support passing ID into GA commands it's hard to
> keep track which of expired commands succeeded and which failed. Okay,
> nowdays GA executes commands sequentially, but that may change in the
> future. We are not allowing timeouts on monitor neither.
> 
> Or maybe I am just too timid and see problems which are not really
> problems :)
> 
> > 
> > But it doesn't really matter either way, all I'm really getting at is that
> > scanning for the 0xFF delimiter in the response shouldn't be *necessary* in
> > this case. But there's no harm in using it that way. You don't need to
> > precede the request with 0xFF if you're just using it to probe for the
> > agent though, and you probably wouldn't want to given that it results in
> > 2 responses:
> > 
> >>
> >> As written in documentation, this command will output sentinel byte to the 
> >> guest agent socket. This works perfectly. However, it is advised in the 
> >> very same documentation to prepend this command with the sentinel as well 
> >> allowing GA parser flush. But this doesn't work for me completely. All I 
> >> can get is:
> >>
> >> $ echo -e "\xFF{\"execute\":\"guest-sync-delimited\", 
> >> \"arguments\":{\"id\":1234}}" | nc -U /tmp/ga.sock | hexdump -C
> >> nc: using stream socket
> >> 00000000  7b 22 65 72 72 6f 72 22  3a 20 7b 22 63 6c 61 73  |{"error": 
> >> {"clas|
> >> 00000010  73 22 3a 20 22 4a 53 4f  4e 50 61 72 73 69 6e 67  |s": 
> >> "JSONParsing|
> >> 00000020  22 2c 20 22 64 61 74 61  22 3a 20 7b 7d 7d 7d 0a  |", "data": 
> >> {}}}.|
> >> 00000030  ff 7b 22 72 65 74 75 72  6e 22 3a 20 31 32 33 34  |.{"return": 
> >> 1234|
> >> 00000040  7d 0a                                             |}.|
> >> 00000042
> >>
> >> The problem is - GA has difficulties with parsing sentinel, although the 
> >> reply is correct, indeed.
> >> Therefore my question is - should I just drop passing sentinel to GA? And 
> >> even if this is fixed, How should I deal with older releases which have 
> >> this bug?
> > 
> > Sorry, I didn't document this properly. Haven't tested host->guest flush
> > in a while and got it in my head that it was handled silently, but what
> > you're seeing has actually always been the observed/intended behavior.
> > 
> > Depending on how you've implemented guest-sync-delimited it might not make
> > a difference though. If you're just ignoring any data up until you see
> > the 0xFF sentinel value then the error is silently thrown away as garbage. 
> > The
> > semantics of the command are that you may read garbage prior to the
> > sentinel+response, it's just that when preceeding the request with the
> > sentinel this is *always* the case.
> > 
> > Otherwise, if you're handling it like a "normal" request, when sending the 
> > 0xFF
> > you would treat it basically as a "flush" command that always returns a
> > JSONParsing error. It's not pretty because we're relying on the json
> > lexer/parser layer for this handling, but it should work reliably for all
> > current/previous versions of qemu-ga.
> > 
> > I'll make sure to fix up the documentation.
> >>
> >> Regards,
> >> Michal
> >>
> > 
> 
> Anyway, thank you for clarify; I'll stick to guest-sync then.

Actually, it's possible a previous libvirt instance or something else
wrote to the channel, so the 0xFF at least is important. It's not actually
specific to guest-sync-delimited, I just did a better job of documenting it
when I added it.

And doing the flush-preceeded sync should actually be simpler to
implement using guest-sync-delimited, since the parser error from the
flush can be safely ignored:

def guest_agent_channel_reset():
    write(channel, 0xFF) #flush
    id = generate_id()
    send_request(channel, 'guest-sync-delimited', { 'id': %s } % id)
    alarm(30)
    while !got_alarm():
        # there may be more than one of these response sequences due to
        # guest-sync* prematurely timing out previously
        if read(channel) == '0xFF': 
            if read_response(channel).id == id:
                return true
    return false

> 
> Michal
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]