qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi


From: Markus Armbruster
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
Date: Thu, 22 Oct 2015 13:50:07 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Valerio Aimale <address@hidden> writes:

> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>> Valerio Aimale <address@hidden> writes:
>>
>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>> Valerio Aimale <address@hidden> writes:
>>>>
>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>> address@hidden writes:
>>>>>>
>>>>>>> All-
>>>>>>>
>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>
>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>
>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>
>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>> structure:
>>>>>>>
>>>>>>> struct request {
>>>>>>>         uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>         uint64_t address;   // address to read from OR write to
>>>>>>>         uint64_t length;    // number of bytes to read OR write
>>>>>>> };
>>>>>>>
>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>
>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>> indicates same (1 success, 0 failure).
>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>> garbage followed by the "it failed" byte?
>>>>> Markus, that appear to be the case. However, I did not write the
>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>> person that wrote the protocol, did not want to bother with over
>>>>> complicating things.
>>>>>
>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>
>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>> reading garbage was less than the price of writing a more complicated
>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>> might have done the same.
>>>> All right, the interface is designed for *small* memory blocks then.
>>>>
>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>> socket.  Small blocks could be done just fine in QMP.
>>> The problem is speed. if one's analyzing the memory space of a running
>>> process (physical and paged), libvmi will make a large number of small
>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>> overhead due to file open/write (server), file open/read/close/unlink
>>> (client).
>>>
>>> Others have gone through the problem before me. It appears that
>>> pmemsave and xp are significantly slower than reading memory using a
>>> socket via pmemaccess.
>> That they're slower isn't surprising, but I'd expect the cost of
>> encoding a small block to be insiginificant compared to the cost of the
>> network roundtrips.
>>
>> As block size increases, the space overhead of encoding will eventually
>> bite.  But for that usage, the binary protocol appears ill-suited,
>> unless the client can pretty reliably avoid read failure.  I haven't
>> examined its failure modes, yet.
>>
>>> The following data is not mine, but it shows the time, in
>>> milliseconds, required to resolve the content of a paged memory
>>> address via socket (pmemaccess) , pmemsave and xp
>>>
>>> http://cl.ly/image/322a3s0h1V05
>>>
>>> Again, I did not produce those data points, they come from an old
>>> libvmi thread.
>> 90ms is a very long time.  What exactly was measured?
> That is a fair question to ask. Unfortunately, I extracted  that data
> plot from an old thread in some libvmi mailing list. I do not have the
> data and code that produced it. Sifting through the thread, I can see
> the code
> was never published. I will take it upon myself to produce code that
> compares timing - in a fair fashion - of libvmi doing an atomic
> operation and a larger-scale operation (like listing running
> processes)  via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
> version of xp that returns byte streams of memory regions base64 or
> base85 encoded in json strings. I'll publish results and code.
>
> However, given workload and life happening, it will be some time
> before I complete that task.

No problem.  I'd like to have your use case addressed, but there's no
need for haste.

[...]
>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>> 64bit VM's
>>>>>>>
>>>>>>> in qapi-schema.json
>>>>>>>
>>>>>>> from
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>>      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>> ---
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>>      'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>> ---
>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>>> confusing.
>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>> over telnet this is what happens:
>>>>>
>>>>> $ telnet localhost 1234
>>>>> Trying 127.0.0.1...
>>>>> Connected to localhost.
>>>>> Escape character is '^]'.
>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>> (qemu) help pmemsave
>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>> at 'addr' of size 'size'
>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>> Try "help pmemsave" for more information
>>>>> (qemu) quit
>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>> no-op.
>>>>
>>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>>> command is *independent* of the QMP command.  The implementation *uses*
>>>> the QMP command.
>>>>
>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>
>>>>       { 'command': 'pmemsave',
>>>>         'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>
>>>> Its implementation is in cpus.c:
>>>>
>>>>       void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>                         Error **errp)
>>>>
>>>> Note the int64_t size.
>>>>
>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>
>>>>       {
>>>>           .name       = "pmemsave",
>>>>           .args_type  = "val:l,size:i,filename:s",
>>>>           .params     = "addr size file",
>>>>           .help       = "save to disk physical memory dump starting at 
>>>> 'addr' of size 'size'",
>>>>           .mhandler.cmd = hmp_pmemsave,
>>>>       },
>>>>
>>>> Its implementation is in hmp.c:
>>>>
>>>>       void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>       {
>>>>           uint32_t size = qdict_get_int(qdict, "size");
>>>>           const char *filename = qdict_get_str(qdict, "filename");
>>>>           uint64_t addr = qdict_get_int(qdict, "val");
>>>>           Error *err = NULL;
>>>>
>>>>           qmp_pmemsave(addr, size, filename, &err);
>>>>           hmp_handle_error(mon, &err);
>>>>       }
>>>>
>>>> Note uint32_t size.
>>>>
>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>> way pmemsave is implemented. There is a single function, and two
>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>> mimics pmemsave closely.
>>>
>>> However, if one wants to simply dump a memory region, via HMP for
>>> human easy of use/debug/testing purposes, one cannot dump memory
>>> regions that resides higher than 2^32-1
>> Can you give an example?
> Yes. I was trying to dump the full extent of physical memory of a VM
> that has 8GB memory space (ballooned). I simply did this:
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
> 'pmemsave' has failed: integer is for 32-bit values
>
> Maybe I misunderstood how pmemsave works. Maybe I should have used
> dump-guest-memory

This is am unnecessary limitation caused by 'size:i' instead of
'size:o'.  Fixable.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]