qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for bac


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
Date: Tue, 13 Jan 2015 18:27:26 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Stefan Hajnoczi <address@hidden> writes:

> On Tue, Jan 13, 2015 at 11:51:51AM +0100, Markus Armbruster wrote:
>> Christian Borntraeger <address@hidden> writes:
>> 
>> > Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
>> >> On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
>> >>> Are you ok with the patches? If yes, can you take care of these
>> >>> patches in the block tree?
>> >> 
>> >> This series looks close, I've left comments on the patches.
>> >
>> > OK, so you would take care of the series in your tree if the next spin
>> > addresses your comments, right?
>> >> 
>> >> The series is fine for command-line QEMU users where probing makes the
>> >> command-line more convenient, so we can merge it.  But the approach is
>> >> fundamentally wrong for stacks where libvirt is in use.
>> >> Libvirt is unaware of the guest geometry and block sizes that are probed
>> >> in QEMU by this patch series.  This breaks non-shared storage migration
>> >> and also means libvirt-based tools that manipulate drives on a guest may
>> >> inadvertently change the guest-visible geometry and cause disk problems.
>> >> 
>> >> For example, what happens when you copy the disk image off a host DASD
>> >> and onto NFS?  QEMU no longer probes the geometry and the disk geometry
>> >> has changed.
>> >> 
>> >> The right place to tackle guest-visible geometry is in libvirt, not in
>> >> QEMU, because it is guest state the needs to be captured in domain XML
>> >> so that migration and tooling can preserve it when manipulating guests.
>> >> 
>> >> Stefan
>> >> 
>> >
>> > I agree that this is not perfect and has obvious holes, but  it works 
>> > just fine with libvirt stacks that are typical on s390 (*). scsi disks
>> > (emulated and real) are not affected and work just fine. DASD disks are
>> > special anyway - please consider this as some kind of real HW pass-through
>> > even when we use virtio-blk. We can assume that admins can provide
>> > shared access between source and target. If you look at the real HW (LPAR
>> > and z/VM) DASDs are always attached via fibre channel (FICON protocol)
>> > (often with SAN switches) so shared access is quite common even over longer
>> > distances.
>> 
>> I wouldn't quite call it pass-through, but I agree that DASDs are
>> special.
>> 
>> > And yes, using NFS or image file will break unless the user specifies the
>> > geometry in libvirt. Setting these values is possible as of today
>> > in libvirts
>> > XML. But what programmatic way should libvirt use to detect that
>> > information
>> > itself? On an image file libvirt doesnt know either. This would be somewhat
>> > possible on an upper level like open stack and that upper level would then
>> > need to talk with the DS8000 storage subsystems and the system z
>> > hardware but
>> > even then it would fail when creating new DASD like images.  (This
>> > would boil
>> > down to provide an interface like "create an image file that looks like as
>> > DASD of type 3390/xx formatted with dasdfmt and the following parameters).
>> 
>> Disk geometry is really a device model property.  For historical
>> reasons, we conjure up default geometries in the block layer.  For
>> convenience, we special-case DASDs (this series).
>> 
>> Very few guests care about device geometry.
>> 
>> If your guest does, relying on default geometry has always exposed you
>> to the risk of an unwanted geometry change.  So far largely theoretical,
>> as the default geometry guessing algorithm has been unlikely to change
>> incompatibily.  To eliminate the risk, specify the geometry explicitly.
>> With libvirt, that means putting it into domain XML.
>> 
>> This series changes *does* change the guessing algorithm, but for DASDs
>> only.  Thus, the risk becomes real, but for users of DASDs only.  Since
>> said users of DASDs ask for it...  let them have rope, I say.
>> 
>> Libvirt could default the geometry to whatever QEMU comes up with, then
>> fix it forever in domain XML.  Doubtful whether it's worth the bother,
>> as very few guests care, and their users are probably (painfully) aware
>> of geometry pitfalls anyway.
>> 
>> > If we talk about cloud/openstack etc we do not have pass through
>> > devices anyway
>> > and only image files. If somebody asks me, I would create all
>> > image files as
>> > SCSI images anyway, all the DASD stuff makes sense if you have
>> > DASD hardware
>> > (or if you really know what you are going to do).
>> >
>> > What is your opinion on 
>> > a) migrating disk properties like geometry
>> > b) comparing the detected disk properties and reject migration if they
>> >    dont match?
>> > Both changes  should provide a safety net and could be added at a later
>> > point in time.
>> 
>> If I remember correctly I suggested to investigate in that direction,
>> but then we concluded it wasn't worth the bother.
>
> Migrating disk geometry violates how live migration works:
>
> 1. Run-time device state like device registers and memory contents are
>    live-migrated by QEMU.
>
> 2. Machine configuration like the list of emulated devices and
>    read-only properties on those devices are not live migrated by QEMU.

Correct.  Frontend configuration could be migrated, but we don't.

> Disk geometry is a property of the emulated storage device, so it is not
> live migrated.

Yes.

> Why is migrating disk geometry incorrect?  Because if you live migrate
> it in QEMU there is no way to persist it on the destination host.  You
> will be unable to cold boot the guest on the destination because you
> lose the machine configuration when the guest shuts down.
>
> The machine configuration is persisted by the management tool or
> manually by the user.  That is where the disk geometry needs to go
> during migration.

Yes, it's actually a bad idea, not just not worth the bother.

> I'm really starting to get worried that you are going to break things.
> This DASD hack is a layering violation but okay, go ahead if you want.

The DASD hack is on top of ancient and equally unclean hacks: geometry
is guessed from the MS-DOS partition table if present and sane, else
from the device size.  A blank disk obviously gets the latter.  Have the
guest write a partition table, then reboot, and you get the former.  If
it's different, and the guest cares, poof!  Fortunately, very few guests
care.

> But now you are also thinking about breaking live migration.

I suspect he's merely thinking about ways to get this series committed,
and now regrets having mentioned migration again ;)

> The proper thing to do is to introduce libvirt XML syntax for DASD.
> That way the geometry can be handled as part of the machine
> configuration.  Then live migration and storage management tools can do
> the right thing.
>
> I've said this should be done in libvirt repeatedly but you keep wanting
> to hack QEMU instead of doing this cleanly :(.

As I explained, the problem isn't new.  This series merely makes it
bigger for DASD.  If that's what DASD users want... let them have rope.

For pretty much everyone else, the problem has been and remains
insignificant.  Does geometry really change when there's no observer?
Hard to get excited about doing geometry right in libvirt...

> If you have plans to expand on this hack, please scrap this series and
> introduce libvirt XML syntax instead.

I wouldn't object to "instead", but I wouldn't object to "in addition
to", either.

> Does what I'm saying make sense?

I don't think you're wrong, we're mostly weighing pros and cons
differently.  But you're the maintainer, so it's ultimately your call.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]