qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] Migrating decrementer


From: David Gibson
Subject: Re: [Qemu-devel] [Qemu-ppc] Migrating decrementer
Date: Thu, 10 Mar 2016 15:57:25 +1100
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Feb 29, 2016 at 08:21:39PM +0000, Mark Cave-Ayland wrote:
> On 29/02/16 03:57, David Gibson wrote:
> 
> > On Fri, Feb 26, 2016 at 12:29:51PM +0000, Mark Cave-Ayland wrote:
> >> On 26/02/16 04:35, David Gibson wrote:
> >>
> >>>> Sign. And let me try that again, this time after caffeine:
> >>>>
> >>>> cpu_start/resume():
> >>>>     cpu->tb_env->tb_offset =
> >>>>         muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> >>>>                  cpu->tb_env->tb_freq, NANOSECONDS_PER_SECOND) +
> >>>>             cpu->tb_env->tb_offset -
> >>>>         cpu_get_host_ticks();
> >>>>
> >>>> This should translate to: at CPU start, calculate the difference between
> >>>> the current guest virtual timebase and the host timebase, storing the
> >>>> difference in cpu->tb_env->tb_offset.
> >>>
> >>> Ummm... I think that's right.  Except that you need to make sure you
> >>> calculate the tb_offset just once, and set the same value to all guest
> >>> CPUs.  Otherwise the guest TBs may be slightly out of sync with each
> >>> other, which is bad (the host should have already ensure that all host
> >>> TBs are in sync with each other).
> >>
> >> Nods. The reason I really like this solution is because it correctly
> >> handles both paused/live machines and simplifies the migration logic
> >> significantly. In fact, with this solution the only information you
> >> would need in vmstate_ppc_timebase for migration would be the current
> >> tb_offset so the receiving host can calculate the guest timebase from
> >> the virtual clock accordingly.
> > 
> >>> We really should make helper routines that each Power machine type can
> >>> use for this.  Unfortunately we can't put it directly into the common
> >>> ppc cpu migration code because of the requirement to keep the TBs
> >>> synced across the machine.
> >>
> >> Effectively I believe there are 2 cases here: TCG and KVM. For TCG all
> >> of this is a no-op since tb/decr are already derived from the virtual
> >> clock + tb_offset already so that really just leaves KVM.
> >>
> >> >From what you're saying we would need 2 hooks for KVM here: one on
> >> machine start/resume which updates tb_offset for all vCPUs and one for
> >> vCPU resume which updates just that one particular vCPU.
> >>
> >> Just curious as to your comment regarding helper routines for each Power
> >> machine type - is there any reason not to enable this globally for all
> >> Power machine types? If tb_offset isn't supported by the guest then the
> >> problem with not being able to handle a jump in timebase post-migration
> >> still remains exactly the same.
> > 
> > Well, I can't see a place to put it globally.  We can't put it in the
> > general vCPU stuff, because that would migrate each CPU's timebase
> > independently, but we want to migrate as a system wide operation, to
> > ensure the TBs are all synchronized in the destination guest.
> > 
> > To do the platform wide stuff, it pretty much has to be in the machine
> > type.
> 
> (goes and looks)
> 
> It strikes me that a good solution here would be to introduce a new
> PPCMachineClass from which all of the PPC machines could derive instead
> of each different machine being a direct subclass of MachineClass (this
> is not dissimilar as to the existing PCMachineClass) and move the
> timebase and decrementer information into it. With this then all of the
> PPC machine types can pick up the changes automatically.

Um.. maybe, yes.  There might be some gotches in attempting that
(particularly maintaining backwards compat for migration), but it
could be worth a shot.

> >> The other question of course relates to the reason this thread was
> >> started which is will this approach still support migrating the
> >> decrementer? My feeling is that this would still work in that we would
> >> encode the number of ticks before the decrementer reaches its interrupt
> >> value into vmstate_ppc_timebase, whether high or low. For TCG it is easy
> >> to ensure that the decrementer will still fire in n ticks time
> >> post-migration (which solves my particular use case), but I don't have a
> >> feeling as to whether this is possible, or indeed desirable for KVM.
> > 
> > Yes, for TCG it should be fairly straightforward.  The DECR should be
> > calculated from the timebase.  We do need to check it on incoming
> > migration though, and check when we need to refire the decrementer
> > interrupt.
> 
> So just to confirm that while reads from the timebase are not privileged
> (and so cannot be intercepted between host and guest), we still have
> individual control of the per-guest decrementer interrupts?

I'm not entirely sure I understand the question, but I think the
answer is yes.

> > For KVM we'll need to load an appropriate value into the real
> > decrementer.  We probably want to migrate a difference between the TB
> > and the decrementer.  What could get hairy here is that there are a
> > number of different variants between ppc models on how exactly the
> > decrementer interrupt triggers: is it edge-triggered on 1->0
> > transition, edge-triggered on 0->-1 transition, or level triggered on
> > the DECR's sign bit.  
> 
> I don't think that is too much of a problem, since for TCG the logic is
> already encapsulated in hw/ppc/ppc.c's __cpu_ppc_store_decr(). It should
> be possible to move this logic into a shared helper function to keep
> everything in one place.
> 
> Finally just to re-iterate that while I can write and compile-test a
> potential patchset, I have no way to test the KVM parts. If I were to
> dedicate some time to this, would yourself/Alex/Alexey be willing to
> help test and debug these changes?

Um.. to some extent.  I can test that you haven't broken spapr easily
enough.  Testing that the Mac machines are working under KVM PR is
trickier, since I'm not really set up for testing the Mac machine
types.  It might work if you can supply VM images and scripts, which I
can then execute on a ppc machine.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]