qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refc


From: Frediano Ziglio
Subject: Re: [Qemu-devel] [RFC] qcow2: 2 way to improve performance updating refcount
Date: Fri, 22 Jul 2011 11:13:54 +0200

2011/7/22 Kevin Wolf <address@hidden>:
> Am 21.07.2011 18:17, schrieb Frediano Ziglio:
>> Hi,
>>   after a snapshot is taken currently many write operations are quite
>> slow due to
>> - refcount updates (decrement old and increment new )
>> - cluster allocation and file expansion
>> - read-modify-write on partial clusters
>>
>> I found 2 way to improve refcount performance
>>
>> Method 1 - Lazy count
>> Mainly do not take into account count for current snapshot, that is
>> current snapshot counts as 0. This would require to add a
>> current_snapshot in header and update refcount when current is changed.
>> So for these operation
>> - creating snapshot, performance are the same, just increment for old
>> snapshot instead of the new one
>> - normal write operations. As current snaphot counts as 0 there is not
>> operations here so do not write any data
>> - changing current snapshot, this is the worst case, you have to
>> increment for the current snapshot and decrement for the new so it will
>> take twice
>> - deleting snapshot, if is the current just set current_snapshot to a
>> dummy not existing value, if is not the current just decrement counters,
>> no performance changes
>
> How would you do cluster allocation if you don't have refcounts any more
> that can tell you if a cluster is used or not?
>

You have refcount, is only that current snapshot counts as 0. An
example may help, start with "A" snapshot A counts as zero so all
refcounts are 0, now we create a snapshot "B" and make it current so
refcounts are 1

A --- B

If you change a cluster in snapshot "B" counts are still 1. If you go
back to "A" counters are increment (cause you leave B) and then
decrement (cause you enter in A).

Perhaps the problem is how to distinguish 0 from "allocated in
current" and "not allocated". Yes, with which I suppose above it's a
problem, but we can easily use -1 as not allocated. If current and
refcount 0 mark as -1, if not current we would have to increment
counters of current, mark current as -1 than decrement for deleting,
yes in this case you have twice the time.

>> Method 2 - Read-only parent
>> Here parents are readonly, instead of storing a refcount store a numeric
>> id of the owner. If the owner is not current copy the cluster and change
>> it. Considering this situation
>>
>> A --- B --- C
>>
>> B cannot be changed so in order to "change" B you have to create a new
>> snapshot
>>
>> A --- B --- C
>>          \--- D
>>
>> and change D. It can take more space cause you have in this case an
>> additional snapshot.
>>
>> Operations:
>> - creating snapshot, really fast as you don't have to change any
>> ownership
>> - normal write operations. If owner is not the same allocate a new
>> cluster and just store a new owner for new cluster. Also ownership for
>> past-to-end cluster could be set all to current owner in order to
>> collapse allocations
>> - changing current snapshot, no changes required for owners
>> - deleting snapshot. Only possible if you have no child or a single
>> child. Will require to scan all l2 tables and merge and update owner.
>
> I think this has similar characteristics as we have with external
> snapshots (i.e. backing files). The advantage is that with applying it
> to internal snapshots is that when deleting a snapshot you don't have to
> copy around all the data.
>
> Probably this change could even be done transparently for the user, so
> that B still appears to be writeable, but in fact refers to D now.
>
>
> Anyway, have you checked how bad the refcount work really is? I think
> that writing the VM state takes a lot longer, so that optimising the
> refcount update may be the wrong approach, especially if it requires a
> format change. My results with qemu-img snapshot suggest that it's not
> worth it:
>
> address@hidden:~/images$ ~/source/qemu/qemu-img info scratch.qcow2
> image: scratch.qcow2
> file format: qcow2
> virtual size: 8.0G (8589934592 bytes)
> disk size: 4.0G
> cluster_size: 65536
> address@hidden:~/images$ time ~/source/qemu/qemu-img snapshot -c test
> scratch.qcow2
>
> real    0m0.116s
> user    0m0.009s
> sys     0m0.040s
> address@hidden:~/images$ time ~/source/qemu/qemu-img snapshot -d test
> scratch.qcow2
>
> real    0m0.084s
> user    0m0.011s
> sys     0m0.044s
>
> Kevin
>

I'm not worried about time just taking snapshot more after taking
snapshot during normal use. As you stated taking snapshot you can
disable cache writethrough making it very fast but during normal
operations you can't.

Personally I'm pondering a log too to allow collapsing metadata
updates. Even an external (another file) full log (with data) to try
to reduce even overhead caused by read-modify-write during partial
cluster updates and reduce file fragmentation. But as you can see from
my patches I'm still exercising myself with Qemu code.

Regards
  Frediano



reply via email to

[Prev in Thread] Current Thread [Next in Thread]