[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] docs: clarify that qcow2 file size is not alway
Re: [Qemu-devel] [PATCH] docs: clarify that qcow2 file size is not always a cluster multiple
Wed, 28 May 2014 11:18:19 +0200
On Tue, May 27, 2014 at 05:56:35PM +0200, Markus Armbruster wrote:
> Benoît Canet <address@hidden> writes:
> > The Tuesday 27 May 2014 à 15:24:00 (+0200), Stefan Hajnoczi wrote :
> >> On Mon, May 26, 2014 at 04:36:15PM +0200, Benoît Canet wrote:
> >> > The Thursday 22 May 2014 à 11:42:50 (+0200), Stefan Hajnoczi wrote :
> >> > > Normally one would expect that qcow2 image file lengths are multiples
> >> > > of
> >> > > the cluster size. This is not true in all cases and the spec should
> >> > > document this so implementers remember to accept such files.
> >> > >
> >> > > $ qemu-img create -f qcow2 foo.qcow2 2G
> >> > > Formatting 'foo.qcow2', fmt=qcow2 size=2147483648 encryption=off
> >> > > cluster_size=65536 lazy_refcounts=off
> >> > > $ ls -l foo.qcow2
> >> > > -rw-r--r-- 1 stefanha stefanha 197120 May 22 11:40 foo.qcow2
> >> > > $ bc -q
> >> > > 3 * (64 * 1024) + 512
> >> > > 197120
> >> > >
> >> > > The extra sector are the 4 L1 table entries that a 2 GB disk with 64 KB
> >> > > cluster size needs. The rest of the L1 table is omitted from the file
> >> > > but allocation will continue at the next cluster boundary.
> >> >
> >> > I think we should fix this to allocate a whole extra cluster instead
> >> > of 512B.
> >> > These days most SSD pages (smalled write unit) are 4KB or 16KB.
> >> > Having most of the file shifted by 512B mean that the SSD controller
> >> > will have to to Read/Modify/Write cycles for most write hence impacting
> >> > performance and SSD life.
> >> It's not shifted. I thought the last sentence explains this:
> >> "The rest of the L1 table is omitted from the file but allocation will
> >> continue at the next cluster boundary."
> >> Are you worried that the host file system will lay out data poorly
> >> because the file looks like this?
> >> | header (1C) | refcounts (2C) | L1 (512B) | hole | Next cluster |
> >> B = bytes
> >> C = clusters
> >> My guess is the next cluster will be aligned to a reasonable boundary on
> >> the physical disk.
> > I have some kind of doubt. Does anyone knows a filesystem guru ?
> I'm not one, but here goes anyway.
> Aligning to a multiple of the SSD's erase block size can only help. A
> common erase block size today is 128KiB. The going recommendation for
> *partition* alignment (which should also be aligned to erase block size)
> is 1MiB. What this means for QCOW2 I'll leave to the good folks working
> on it.
> Here's some (dated) advice on aligning for SSDs from a real filesystem
> guru: http://tytso.livejournal.com/2009/02/20/
We need to do this in two steps:
1. Update the qcow2 specification to clarify that existing files may not
be multiples of cluster size.
2. Update QEMU implementation to write full clusters to the file.
That way we get the performance benefits but warn implementors that
files might not be multiples of cluster size.
This patch addresses #1, so unless anyone objects to the spec wording, I
think it should be merged.