qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi


From: Valerio Aimale
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
Date: Thu, 22 Oct 2015 12:43:06 -0600
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 10/21/15 4:54 AM, Markus Armbruster wrote:
Valerio Aimale <address@hidden> writes:

On 10/19/15 1:52 AM, Markus Armbruster wrote:
Valerio Aimale <address@hidden> writes:

On 10/16/15 2:15 AM, Markus Armbruster wrote:
address@hidden writes:

All-

I've produced a patch for the current QEMU HEAD, for libvmi to
introspect QEMU/KVM VMs.

Libvmi has patches for the old qeum-kvm fork, inside its source tree:
https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

This patch adds a hmp and a qmp command, "pmemaccess". When the
commands is invoked with a string arguments (a filename), it will open
a UNIX socket and spawn a listening thread.

The client writes binary commands to the socket, in the form of a c
structure:

struct request {
        uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
        uint64_t address;   // address to read from OR write to
        uint64_t length;    // number of bytes to read OR write
};

The client receives as a response, either (length+1) bytes, if it is a
read operation, or 1 byte ifit is a write operation.

The last bytes of a read operation response indicates success (1
success, 0 failure). The single byte returned for a write operation
indicates same (1 success, 0 failure).
So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
garbage followed by the "it failed" byte?
Markus, that appear to be the case. However, I did not write the
communication protocol between libvmi and qemu. I'm assuming that the
person that wrote the protocol, did not want to bother with over
complicating things.

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

I'm thinking he assumed reads would be small in size and the price of
reading garbage was less than the price of writing a more complicated
protocol. I can see his point, confronted with the same problem, I
might have done the same.
All right, the interface is designed for *small* memory blocks then.

Makes me wonder why he needs a separate binary protocol on a separate
socket.  Small blocks could be done just fine in QMP.
The problem is speed. if one's analyzing the memory space of a running
process (physical and paged), libvmi will make a large number of small
and mid-sized reads. If one uses xp, or pmemsave, the overhead is
quite significant. xp has overhead due to encoding, and pmemsave has
overhead due to file open/write (server), file open/read/close/unlink
(client).

Others have gone through the problem before me. It appears that
pmemsave and xp are significantly slower than reading memory using a
socket via pmemaccess.
That they're slower isn't surprising, but I'd expect the cost of
encoding a small block to be insiginificant compared to the cost of the
network roundtrips.

As block size increases, the space overhead of encoding will eventually
bite.  But for that usage, the binary protocol appears ill-suited,
unless the client can pretty reliably avoid read failure.  I haven't
examined its failure modes, yet.

The following data is not mine, but it shows the time, in
milliseconds, required to resolve the content of a paged memory
address via socket (pmemaccess) , pmemsave and xp

http://cl.ly/image/322a3s0h1V05

Again, I did not produce those data points, they come from an old
libvmi thread.
90ms is a very long time.  What exactly was measured?

I think it might be conceivable that there could be a QMP command that
returns the content of an arbitrarily size memory region as a base64
or a base85 json string. It would still have both time- (due to
encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
overhead, + json encoding/decoding overhead. It might still be the
case that socket would outperform such a command as well,
speed-vise. I don't think it would be any faster than xp.
A special-purpose binary protocol over a dedicated socket will always do
less than a QMP solution (ignoring foolishness like transmitting crap on
read error the client is then expected to throw away).  The question is
whether the difference in work translates to a worthwhile difference in
performance.

The larger question is actually whether we have an existing interface
that can serve the libvmi's needs.  We've discussed monitor commands
like xp, pmemsave, pmemread.  There's another existing interface: the
GDB stub.  Have you considered it?

There's also a similar patch, floating around the internet, the uses
shared memory, instead of sockets, as inter-process communication
between libvmi and QEMU. I've never used that.
By the time you built a working IPC mechanism on top of shared memory,
you're often no better off than with AF_LOCAL sockets.

Crazy idea: can we allocate guest memory in a way that support sharing
it with another process?  Eduardo, can -mem-path do such wild things?
Markus, your suggestion led to a lightbulb going off in my head.

What if there was a qmp command, say 'pmemmap' then when invoked, performs the following:

qmp_pmemmap( [...]) {

    char *template = "/tmp/QEM_mmap_XXXXXXX";
    int mmap_fd;
uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming VM with 8GB RAM */);

cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr) 8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will discuss write later */);

   mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");

mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0);

  /* etc */

}

pmemmap would return the following json

{
    'success' : 'true',
    'map_filename' : '/tmp/QEM_mmap_1234567'
}

the qmp client/caller, would then allocate a region of memory equal to the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the region. It would then have (read, maybe write?) access to the full extent of the guest memory without making any other qmp call. I think it would be fast, and with low memory usage, as mmap() is pretty efficient.

Of course, there would be a 'pmemunmap' qmp commands that would perform the cleanup

/* etc. */
munmap()
cpu_physical_memory_unmap();
 /* etc. */

Would that work? Is mapping the full extent of the guest RAM too much to ask of cpu_physical_memory_rw()?

The socket API was written by the libvmi author and it works the with
current libvmi version. The libvmi client-side implementation is at:

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

As many use kvm VM's for introspection, malware and security analysis,
it might be worth thinking about making the pmemaccess a permanent
hmp/qmp command, as opposed to having to produce a patch at each QEMU
point release.
Related existing commands: memsave, pmemsave, dump-guest-memory.

Can you explain why these won't do for your use case?
For people who do security analysis there are two use cases, static
and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
can do static analysis. I.e. snapshotting a VM and see what was
happening at that point in time.
Dynamic analysis require to be able to 'introspect' a VM while it's running.

If you take a snapshot of two people exchanging a glass of water, and
you happen to take it at the very moment both persons have their hands
on the glass, it's hard to tell who passed the glass to whom. If you
have a movie of the same scene, it's obvious who's the giver and who's
the receiver. Same use case.
I understand the need for introspecting a running guest.  What exactly
makes the existing commands unsuitable for that?
Speed. See discussion above.
More to the point, there's a host of C and python frameworks to
dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
build on top of libvmi. I did not want to reinvent the wheel.
Fair enough.

Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
memory files."  What exactly is missing for KVM?
When they say they support kvm, what they really mean they support the
(retired, I understand) qemu-kvm fork via a patch that is provided in
the libvmi source tree. I think the most recent qem-kvm supported is
1.6.0

https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

I wanted to bring support to the head revision of QEMU, to bring
libvmi level with more modern QEMU.

Maybe the solution is simply to put this patch in the libvmi source
tree,  which I've already asked to do via pull request, leaving QEMU
alone.
However, the patch has to be updated at every QEMU point release. I
wanted to avoid that, if at all possible.

Mind you, 99.9% of people that do dynamic VM analysis use xen. They
contend that xen has better introspection support. In my case, I did
not want to bother with dedicating a full server to be a xen domain
0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
an otherwise purposed server.
I'm not at all against better introspection support in QEMU.  I'm just
trying to understand the problem you're trying to solve with your
patches.
What all users of libvmi would love to have is super high speed access
to VM physical memory as part of the QEMU source tree, and not
supported via a patch. Implemented as the QEMU owners see fit, as long
as it is blazing fast and easy accessed via client library or
inter-process communication.
The use case makes sense to me, we just need to figure out how we want
to serve it in QEMU.

My gut feeling is that it has to bypass QMP protocol/encoding/file
access/json to be fast, but, it is just a gut feeling - worth nothing.
My gut feeling is that QMP should do fine in overhead compared to other
solutions involving socket I/O as long as the data sizes are *small*.
Latency might be an issue, though: QMP commands are processed from the
main loop.  A dedicated server thread can be more responsive, but
letting it write to shared resources could be "interesting".

Also, the pmemsave commands QAPI should be changed to be usable with
64bit VM's

in qapi-schema.json

from

---
{ 'command': 'pmemsave',
     'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
---

to

---
{ 'command': 'pmemsave',
     'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
---
In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
confusing.
I think it's confusing for the HMP parser too. If you have a VM with
8Gb of RAM and want to snapshot the whole physical memory, via HMP
over telnet this is what happens:

$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) help pmemsave
pmemsave addr size file -- save to disk physical memory dump starting
at 'addr' of size 'size'
(qemu) pmemsave 0 8589934591 "/tmp/memorydump"
'pmemsave' has failed: integer is for 32-bit values
Try "help pmemsave" for more information
(qemu) quit
Your change to pmemsave's definition in qapi-schema.json is effectively a
no-op.

Your example shows *HMP* command pmemsave.  The definition of an HMP
command is *independent* of the QMP command.  The implementation *uses*
the QMP command.

QMP pmemsave is defined in qapi-schema.json as

      { 'command': 'pmemsave',
        'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }

Its implementation is in cpus.c:

      void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
                        Error **errp)

Note the int64_t size.

HMP pmemsave is defined in hmp-commands.hx as

      {
          .name       = "pmemsave",
          .args_type  = "val:l,size:i,filename:s",
          .params     = "addr size file",
          .help       = "save to disk physical memory dump starting at 'addr' of 
size 'size'",
          .mhandler.cmd = hmp_pmemsave,
      },

Its implementation is in hmp.c:

      void hmp_pmemsave(Monitor *mon, const QDict *qdict)
      {
          uint32_t size = qdict_get_int(qdict, "size");
          const char *filename = qdict_get_str(qdict, "filename");
          uint64_t addr = qdict_get_int(qdict, "val");
          Error *err = NULL;

          qmp_pmemsave(addr, size, filename, &err);
          hmp_handle_error(mon, &err);
      }

Note uint32_t size.

Arguably, the QMP size argument should use 'size' (an alias for
'uint64'), and the HMP args_type should use 'size:o'.
Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
way pmemsave is implemented. There is a single function, and two
points of entrance, one for HMP and one for QMP. I think pmemacess
mimics pmemsave closely.

However, if one wants to simply dump a memory region, via HMP for
human easy of use/debug/testing purposes, one cannot dump memory
regions that resides higher than 2^32-1
Can you give an example?

[...]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]