qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ens


From: Mark Cave-Ayland
Subject: [Qemu-devel] Migrating decrementer (was: Re: [PATCH 4/4] target-ppc: ensure we include the decrementer value during migration)
Date: Mon, 25 Jan 2016 05:48:02 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 18/01/16 04:51, David Gibson wrote:

> On Fri, Jan 15, 2016 at 05:46:10PM +0000, Mark Cave-Ayland wrote:
>> On 12/01/16 02:44, David Gibson wrote:
>>
>>>>> In other words, isn't this just skipping the decrementer interrupts at
>>>>> the qemu level rather than the guest level?
>>>>>
>>>>> It seems that instead we should be reconstructing the decrementer on
>>>>> the destination based on an offset from the timebase.
>>>>
>>>> Well I haven't really looked at how time warping works during in
>>>> migration for QEMU, however this seems to be the method used by
>>>> hw/ppc/ppc.c's timebase_post_load() function but my understanding is
>>>> that this isn't currently available for the g3beige/mac99 machines?
>>>
>>> Ah.. yes, it looks like the timebase migration stuff is only hooked in
>>> on the pseries machine type.  As far as I can tell it should be
>>> trivial to add it to other machines though - it doesn't appear to rely
>>> on anything outside the common ppc timebase stuff.
>>>
>>>> Should the patch in fact do this but also add decrementer support? And
>>>> if it did, would this have a negative effect on pseries?
>>>
>>> Yes, I think that's the right approach.  Note that rather than
>>> duplicating the logic to adjust the decrementer over migration, it
>>> should be possible to encode the decrementer as a diff from the
>>> timebase across the migration.
>>>
>>> In fact.. I'm not sure it ever makes sense to store the decrementer
>>> value as a direct value, since it's constantly changing - probably
>>> makes more sense to derive it from the timebase whenever it is needed.
>>>
>>> As far as I know that should be fine for pseries.  I think the current
>>> behaviour is probably technically wrong for pseries as well, but the
>>> timing code of our Linux guests is robust enough to handle a small
>>> displacement to the time of the next decrementer interrupt.
>>
>> I've had a bit of an experiment trying to implement something suitable,
>> but I'm not 100% certain I've got this right.
>>
>> >From the code my understanding is that the timebase is effectively
>> free-running and so if a migration takes 5s then you use tb_offset to
>> calculate the difference between the timebase before migration, and
>> subsequently apply the offset for all future reads of the timebase for
>> the lifetime of the CPU (i.e. the migrated guest is effectively living
>> at a point in the past where the timebase is consistent).
> 
> Um.. no.  At least in the usual configuration, the timebase represents
> real, wall-clock time, so we expect it to jump forward across the
> migration downtime.  This is important because the guest will use the
> timebase to calculate real time differences.
> 
> However, the absolute value of the timebase may be different on the
> *host* between source and destination for migration.  So what we need
> to do is before migration we work out the delta between host and guest
> notions of wall clock time (as defined by the guest timebase), and
> transfer that in the migration stream.
> 
> On the destination we initialize the guest timebase so that the guest
> maintains the same realtime offset from the host.  This means that as
> long as source and destination system time is synchronized, guest
> real-time tracking will continue correctly across the migration.
> 
> We do also make sure that the guest timebase never goes backwards, but
> that would only happen if the source and destination host times were
> badly out of sync.

I had a poke at trying to include the timebase state in the Mac machines
but found that enabling this caused migration to freeze, even on top of
my existing (known working) patchset.

Looking closer at the code, I don't think it can work on an x86 TCG host
in its current form since the host/guest clocks are used interchangeably
in places, e.g. in timebase_pre_save() we have this:

uint64_t ticks = cpu_get_host_ticks();
...
tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;

So this implies that guest_timebase is set in higher resolution host
ticks. But then in timebase_post_load() we have this:

migration_duration_tb = muldiv64(migration_duration_ns, freq,
    NANOSECONDS_PER_SECOND);

which calculates the migration time in timebase ticks but then:

guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
tb_off_adj = guest_tb - cpu_get_host_ticks();

which mixes tb_remote->guest_timebase in host ticks with
migration_duration_tb in timebase ticks.

I think that tb_off_adj should be in host ticks so should
migration_duration_tb be dropped and guest_tb be calculated from
migration_duration_ns instead? At least given the current mixing of host
ticks and timebase ticks, I think this can only work correctly on PPC
where the two are the same.

One other thing I noticed is that this code "broke" my normal testing
pattern when I tend to launch qemu-system-ppc with -loadvm foo -S with
the machine stopped. In this case the migration duration calculation is
performed at the moment state is restored, and not the point where the
CPU is started.

Should the adjustment calculation not take place in a cpu_start()
function instead? Otherwise at the point when I resume the paused
machine the tb_off_adj calculation will be based in the past and
therefore wrong.


ATB,

Mark.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]