[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protoc

From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation
Date: Tue, 16 Apr 2013 01:20:26 +0300

On Mon, Apr 15, 2013 at 09:07:01AM -0400, Michael R. Hines wrote:
> On 04/15/2013 02:00 AM, Michael S. Tsirkin wrote:
> >On Sun, Apr 14, 2013 at 09:06:36PM -0400, Michael R. Hines wrote:
> >>On 04/14/2013 05:10 PM, Michael S. Tsirkin wrote:
> >>>On Sun, Apr 14, 2013 at 03:06:11PM -0400, Michael R. Hines wrote:
> >>>>On 04/14/2013 02:30 PM, Michael S. Tsirkin wrote:
> >>>>>On Sun, Apr 14, 2013 at 12:40:10PM -0400, Michael R. Hines wrote:
> >>>>>>On 04/14/2013 12:03 PM, Michael S. Tsirkin wrote:
> >>>>>>>On Sun, Apr 14, 2013 at 10:27:24AM -0400, Michael R. Hines wrote:
> >>>>>>>>On 04/14/2013 07:59 AM, Michael S. Tsirkin wrote:
> >>>>>>>>>On Fri, Apr 12, 2013 at 04:43:54PM +0200, Paolo Bonzini wrote:
> >>>>>>>>>>Il 12/04/2013 13:25, Michael S. Tsirkin ha scritto:
> >>>>>>>>>>>On Fri, Apr 12, 2013 at 12:53:11PM +0200, Paolo Bonzini wrote:
> >>>>>>>>>>>>Il 12/04/2013 12:48, Michael S. Tsirkin ha scritto:
> >>>>>>>>>>>>>1.  You have two protocols already and this does not make sense 
> >>>>>>>>>>>>>in
> >>>>>>>>>>>>>version 1 of the patch.
> >>>>>>>>>>>>It makes sense if we consider it experimental (add x- in front of
> >>>>>>>>>>>>transport and capability) and would like people to play with it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Paolo
> >>>>>>>>>>>But it's not testable yet.  I see problems just reading the
> >>>>>>>>>>>documentation.  Author thinks "ulimit -l 10000000000" on both 
> >>>>>>>>>>>source and
> >>>>>>>>>>>destination is just fine.  This can easily crash host or cause OOM
> >>>>>>>>>>>killer to kill QEMU.  So why is there any need for extra testers?  
> >>>>>>>>>>>Fix
> >>>>>>>>>>>the major bugs first.
> >>>>>>>>>>>
> >>>>>>>>>>>There's a similar issue with device assignment - we can't fix it 
> >>>>>>>>>>>there,
> >>>>>>>>>>>and despite being available for years, this was one of two reasons 
> >>>>>>>>>>>that
> >>>>>>>>>>>has kept this feature out of hands of lots of users (and assuming 
> >>>>>>>>>>>guest
> >>>>>>>>>>>has lots of zero pages won't work: balloon is not widely used 
> >>>>>>>>>>>either
> >>>>>>>>>>>since it depends on a well-behaved guest to work correctly).
> >>>>>>>>>>I agree assuming guest has lots of zero pages won't work, but I 
> >>>>>>>>>>think
> >>>>>>>>>>you are overstating the importance of overcommit.  Let's mark the 
> >>>>>>>>>>damn
> >>>>>>>>>>thing as experimental, and stop making perfect the enemy of good.
> >>>>>>>>>>
> >>>>>>>>>>Paolo
> >>>>>>>>>It looks like we have to decide, before merging, whether migration 
> >>>>>>>>>with
> >>>>>>>>>rdma that breaks overcommit is worth it or not.  Since the author 
> >>>>>>>>>made
> >>>>>>>>>it very clear he does not intend to make it work with overcommit, 
> >>>>>>>>>ever.
> >>>>>>>>>
> >>>>>>>>That depends entirely as what you define as overcommit.
> >>>>>>>You don't get to define your own terms.  Look it up in wikipedia or
> >>>>>>>something.
> >>>>>>>
> >>>>>>>>The pages do get unregistered at the end of the migration =)
> >>>>>>>>
> >>>>>>>>- Michael
> >>>>>>>The limitations are pretty clear, and you really should document them:
> >>>>>>>
> >>>>>>>1. run qemu as root, or under ulimit -l <total guest memory> on both 
> >>>>>>>source and
> >>>>>>>   destination
> >>>>>>>
> >>>>>>>2. expect that as much as that amount of memory is pinned
> >>>>>>>   and unvailable to host kernel and applications for
> >>>>>>>   arbitrarily long time.
> >>>>>>>   Make sure you have much more RAM in host or QEMU will get killed.
> >>>>>>>
> >>>>>>>To me, especially 1 is an unacceptable security tradeoff.
> >>>>>>>It is entirely fixable but we both have other priorities,
> >>>>>>>so it'll stay broken.
> >>>>>>>
> >>>>>>I've modified the beginning of docs/rdma.txt to say the following:
> >>>>>It really should say this, in a very prominent place:
> >>>>>
> >>>>>BUGS:
> >>>>Not a bug. We'll have to agree to disagree. Please drop this.
> >>>It's not a feature, it makes management harder and
> >>>will bite some users who are not careful enough
> >>>to read documentation and know what to expect.
> >>Something that does not exist cannot be a bug. That's called a
> >>non-existent optimization.
> >No because overcommit already exists, and works with migration.  It's
> >your patch that breaks it.  We already have a ton of migration variants
> >and they all work fine.  So in 2013 overcommit is a given.
> >
> >Look we can include code with known bugs, but we have to be very
> >explicit about them, because someone *will* be confused.  If it's a hard
> >bug to fix it won't get solved quickly but please stop pretending it's
> >perfect.
> >
> Setting aside RDMA for the moment, Are you trying to tell me that
> someone would
> *willingly* migrate a VM to a hypervisor without first validating
> (programmatically)
> whether or not the machine already has enough memory to support the entire
> footprint of the VM?
> If you answer yes to that question is yes, it's a bug.

enough virtual memory. not physical memory.

> That also means *any* use of RDMA in any application in the universe is also
> a bug and it also means that any HPC application running against cgroups is
> also buggy.

no, people don't normally lock up gigabytes of memory in HPC either.

> I categorically refuse to believe that someone runs a datacenter in
> this manner.

so you don't believe people overcommit memory.

> >
> >>>>>1. You must run qemu as root, or under
> >>>>>    ulimit -l <total guest memory> on both source and destination
> >>>>Good, will update the documentation now.
> >>>>>2. Expect that as much as that amount of memory to be locked
> >>>>>    and unvailable to host kernel and applications for
> >>>>>    arbitrarily long time.
> >>>>>    Make sure you have much more RAM in host otherwise QEMU,
> >>>>>    or some other arbitrary application on same host, will get killed.
> >>>>This is implied already. The docs say "If you don't want pinning,
> >>>>then use TCP".
> >>>>That's enough warning.
> >>>No it's not. Pinning is jargon, and does not mean locking
> >>>up gigabytes.  Why are you using jargon?
> >>>Explain the limitation in plain English so people know
> >>>when to expect things to work.
> >>Already done.
> >>
> >>>>>3. Migration with RDMA support is experimental and unsupported.
> >>>>>    In particular, please do not expect it to work across qemu versions,
> >>>>>    and do not expect the management interface to be stable.
> >>>>The only correct statement here is that it's experimental.
> >>>>
> >>>>I will update the docs to reflect that.
> >>>>
> >>>>>>$ cat docs/rdma.txt
> >>>>>>
> >>>>>>... snip ..
> >>>>>>
> >>>>>>===============
> >>>>>>
> >>>>>>Use of RDMA requires pinning and registering memory with the
> >>>>>>hardware. If this is not acceptable for your application or
> >>>>>>product, then the use of RDMA is strongly discouraged and you
> >>>>>>should revert back to standard TCP-based migration.
> >>>>>No one knows of should know what "pinning and registering" means.
> >>>>I will define it in the docs, then.
> >>>Keep it simple. Just tell people what they need to know.
> >>>It's silly to expect users to understand internals of
> >>>the product before they even try it for the first time.
> >>Agreed.
> >>
> >>>>>For which applications and products is it appropriate?
> >>>>That's up to the vendor or user to decide, not us.
> >>>With zero information so far, no one will be
> >>>able to decide.
> >>There is plenty of information. Including this email thread.
> >Nowhere in this email thread or in your patchset did you tell anyone for
> >which applications and products is it appropriate.  You also expect
> >someone to answer this question before they run your code.  It looks
> >like the purpose of this phrase is to assign blame rather than to
> >inform.
> >>>>>Also, you are talking about current QEMU
> >>>>>code using RDMA for migration but say "RDMA" generally.
> >>>>Sure, I will fix the docs.
> >>>>
> >>>>>>Next, decide if you want dynamic page registration on the server-side.
> >>>>>>For example, if you have an 8GB RAM virtual machine, but only 1GB
> >>>>>>is in active use, then disabling this feature will cause all 8GB to
> >>>>>>be pinned and resident in memory. This feature mostly affects the
> >>>>>>bulk-phase round of the migration and can be disabled for extremely
> >>>>>>high-performance RDMA hardware using the following command:
> >>>>>>QEMU Monitor Command:
> >>>>>>$ migrate_set_capability chunk_register_destination off # enabled by 
> >>>>>>default
> >>>>>>
> >>>>>>Performing this action will cause all 8GB to be pinned, so if that's
> >>>>>>not what you want, then please ignore this step altogether.
> >>>>>This does not make it clear what is the benefit of disabling this
> >>>>>capability. I think it's best to avoid options, just use chunk
> >>>>>based always.
> >>>>>If it's here "so people can play with it" then please rename
> >>>>>it to something like "x-unsupported-chunk_register_destination"
> >>>>>so people know this is unsupported and not to be used for production.
> >>>>Again, please drop the request for removing chunking.
> >>>>
> >>>>Paolo already told me to use "x-rdma" - so that's enough for now.
> >>>>
> >>>>- Michael
> >>>You are adding a new command that's also experimental, so you must tag
> >>>it explicitly too.
> >>The entire migration is experimental - which by extension makes the
> >>capability experimental.
> >Again the purpose of documentation is not to educate people about
> >qemu or rdma internals but to educate them how to use a feature.
> >It doesn't even mention rdma anywhere in the name of the capability.
> >Users won't make the connection.  You also didn't bother telling anyone
> >when to set the option.  Is it here "to be able to play with it"?  Does
> >it have any purpose for users not in a playful mood?  If yes your
> >documentation should say what it is, if no mention that.
> >
> The purpose of the capability is made blatantly clear in the documentation.
> - Michael

I know it's not clear to me.

You ask people to decide whether to use it or not basically first
or second thing. So tell them, in plain English, then and there,
what they need to know in order to decide. Not ten pages
down in the middle of a description of qemu internals.
Or just drop one of the variants - how much speed difference
is there between them?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]