qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc:


From: David Gibson
Subject: Re: [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration)
Date: Mon, 25 Jan 2016 22:10:08 +1100
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Jan 25, 2016 at 05:48:02AM +0000, Mark Cave-Ayland wrote:
> On 18/01/16 04:51, David Gibson wrote:
> 
> > On Fri, Jan 15, 2016 at 05:46:10PM +0000, Mark Cave-Ayland wrote:
> >> On 12/01/16 02:44, David Gibson wrote:
> >>
> >>>>> In other words, isn't this just skipping the decrementer interrupts at
> >>>>> the qemu level rather than the guest level?
> >>>>>
> >>>>> It seems that instead we should be reconstructing the decrementer on
> >>>>> the destination based on an offset from the timebase.
> >>>>
> >>>> Well I haven't really looked at how time warping works during in
> >>>> migration for QEMU, however this seems to be the method used by
> >>>> hw/ppc/ppc.c's timebase_post_load() function but my understanding is
> >>>> that this isn't currently available for the g3beige/mac99 machines?
> >>>
> >>> Ah.. yes, it looks like the timebase migration stuff is only hooked in
> >>> on the pseries machine type.  As far as I can tell it should be
> >>> trivial to add it to other machines though - it doesn't appear to rely
> >>> on anything outside the common ppc timebase stuff.
> >>>
> >>>> Should the patch in fact do this but also add decrementer support? And
> >>>> if it did, would this have a negative effect on pseries?
> >>>
> >>> Yes, I think that's the right approach.  Note that rather than
> >>> duplicating the logic to adjust the decrementer over migration, it
> >>> should be possible to encode the decrementer as a diff from the
> >>> timebase across the migration.
> >>>
> >>> In fact.. I'm not sure it ever makes sense to store the decrementer
> >>> value as a direct value, since it's constantly changing - probably
> >>> makes more sense to derive it from the timebase whenever it is needed.
> >>>
> >>> As far as I know that should be fine for pseries.  I think the current
> >>> behaviour is probably technically wrong for pseries as well, but the
> >>> timing code of our Linux guests is robust enough to handle a small
> >>> displacement to the time of the next decrementer interrupt.
> >>
> >> I've had a bit of an experiment trying to implement something suitable,
> >> but I'm not 100% certain I've got this right.
> >>
> >> >From the code my understanding is that the timebase is effectively
> >> free-running and so if a migration takes 5s then you use tb_offset to
> >> calculate the difference between the timebase before migration, and
> >> subsequently apply the offset for all future reads of the timebase for
> >> the lifetime of the CPU (i.e. the migrated guest is effectively living
> >> at a point in the past where the timebase is consistent).
> > 
> > Um.. no.  At least in the usual configuration, the timebase represents
> > real, wall-clock time, so we expect it to jump forward across the
> > migration downtime.  This is important because the guest will use the
> > timebase to calculate real time differences.
> > 
> > However, the absolute value of the timebase may be different on the
> > *host* between source and destination for migration.  So what we need
> > to do is before migration we work out the delta between host and guest
> > notions of wall clock time (as defined by the guest timebase), and
> > transfer that in the migration stream.
> > 
> > On the destination we initialize the guest timebase so that the guest
> > maintains the same realtime offset from the host.  This means that as
> > long as source and destination system time is synchronized, guest
> > real-time tracking will continue correctly across the migration.
> > 
> > We do also make sure that the guest timebase never goes backwards, but
> > that would only happen if the source and destination host times were
> > badly out of sync.
> 
> I had a poke at trying to include the timebase state in the Mac machines
> but found that enabling this caused migration to freeze, even on top of
> my existing (known working) patchset.
> 
> Looking closer at the code, I don't think it can work on an x86 TCG host
> in its current form since the host/guest clocks are used interchangeably
> in places, e.g. in timebase_pre_save() we have this:
> 
> uint64_t ticks = cpu_get_host_ticks();
> ...
> tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> 
> So this implies that guest_timebase is set in higher resolution host
> ticks. But then in timebase_post_load() we have this:
> 
> migration_duration_tb = muldiv64(migration_duration_ns, freq,
>     NANOSECONDS_PER_SECOND);
> 
> which calculates the migration time in timebase ticks but then:
> 
> guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> tb_off_adj = guest_tb - cpu_get_host_ticks();
> 
> which mixes tb_remote->guest_timebase in host ticks with
> migration_duration_tb in timebase ticks.
> 
> I think that tb_off_adj should be in host ticks so should
> migration_duration_tb be dropped and guest_tb be calculated from
> migration_duration_ns instead? At least given the current mixing of host
> ticks and timebase ticks, I think this can only work correctly on PPC
> where the two are the same.
> 
> One other thing I noticed is that this code "broke" my normal testing
> pattern when I tend to launch qemu-system-ppc with -loadvm foo -S with
> the machine stopped. In this case the migration duration calculation is
> performed at the moment state is restored, and not the point where the
> CPU is started.
> 
> Should the adjustment calculation not take place in a cpu_start()
> function instead? Otherwise at the point when I resume the paused
> machine the tb_off_adj calculation will be based in the past and
> therefore wrong.

Um.. so the migration duration is a complete red herring, regardless
of the units.

Remember, we only ever compute the guest timebase value at the moment
the guest requests it - actually maintaining a current timebase value
makes sense in hardware, but would be nuts in software.

The timebase is a function of real, wall-clock time, and the migration
destination has a notion of wall-clock time without reference to the
source.

So what you need to transmit for migration is enough information to
compute the guest timebase from real-time - essentially just an offset
between real-time and the timebase.

The guest can potentially observe the migration duration as a jump in
timebase values, but qemu doesn't need to do any calculations with it.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]