[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Moving beyond image files

From: Anthony Liguori
Subject: [Qemu-devel] Moving beyond image files
Date: Mon, 21 Mar 2011 10:05:20 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8

We've been evaluating block migration in a real environment to try to understand what the overhead of it is compared to normal migration. The results so far are pretty disappointing. The speed of local disks ends up becoming a big bottleneck even before the network does.

This has got me thinking about what we could do to avoid local I/O via deduplication and other techniques. This has led me to wonder if its time to move beyond simple image files into something a bit more sophisticated.

Ideally, I'd want a full Content Addressable Storage database like Venti but there are lots of performance concerns with something like that.

I've been thinking about a middle ground and am looking for some feedback. Here's my current thinking:

1) All block I/O goes through a daemon. There may be more than one daemon to support multi-tenancy.

2) The daemon maintains metadata for each image that includes an extent mapping and then a clustered allocated bitmap within each extent (similar to FVD).

At this point, it's basically sparse raw but through a single daemon.

3) All writes result in a sha1 being calculated before the write is completed. The daemon maintains a mapping of sha1's -> clusters. A single sha1 may map to many clusters. The sha1 mapping can be made eventually consistent using a journal or even dirty bitmap. It can be partially rebuilt easily.

I think this is where v1 stops. With just this level of functionality, I think we have some very interesting properties:

a) Performance should be pretty close to raw

b) Without doing any (significant) disk I/O, we know exactly what data an image is composed of. This means we can do an rsync style image streaming that uses potentially much less network I/O and potentially much less disk I/O.

In a v2, I think you can add some interesting features that take advantage of the hashing. For instance:

4) If you run out of disk space, you can looking at a hash with a refcount > 1, and split off a reference making it copy-on-write. Then you can treat the remaining references as free list entries.

5) Copy-on-write references potentially become very interesting for image streaming because you can avoid any I/O for blocks that are already stored locally.

This is not fully baked yet but I thought I'd at least throw it out there as a topic for discussion. I think we've focused almost entirely on single images so I think it's worth thinking a little about different storage models.


Anthony Liguori

reply via email to

[Prev in Thread] Current Thread [Next in Thread]