[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/3] migration: Downtime tracepoints
|
From: |
Joao Martins |
|
Subject: |
Re: [PATCH 0/3] migration: Downtime tracepoints |
|
Date: |
Thu, 26 Oct 2023 20:33:13 +0100 |
|
User-agent: |
Mozilla Thunderbird |
On 26/10/2023 19:18, Peter Xu wrote:
> On Thu, Oct 26, 2023 at 01:03:57PM -0400, Peter Xu wrote:
>> On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote:
>>> On 26/10/2023 16:53, Peter Xu wrote:
>>>> This small series (actually only the last patch; first two are cleanups)
>>>> wants to improve ability of QEMU downtime analysis similarly to what Joao
>>>> used to propose here:
>>>>
>>>>
>>>> https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
>>>>
>>> Thanks for following up on the idea; It's been hard to have enough
>>> bandwidth for
>>> everything on the past set of weeks :(
>>
>> Yeah, totally understdood. I think our QE team pushed me towards some
>> series like this, while my plan was waiting for your new version. :)
>>
Oh my end, it was similar (though not by QE/QA) with folks feeling at a blank
when they see a bigger downtime.
Having an explainer/breakdown totally makes this easier to poke holes on where
problems are.
>> Then when I started I decided to go into per-device. I was thinking of
>> also persist that information, but then I remembered some ppc guest can
>> have ~40,000 vmstates.. and memory to maintain that may or may not regress
>> a ppc user. So I figured I should first keep it simple with tracepoints.
>>
Yeah, I should have removed that last patch for QAPI.
vmstates is something that I wasn't quite liking how it looked, but I think you
managed to square a relatively clean way on that last patch.
>>>
>>>> But with a few differences:
>>>>
>>>> - Nothing exported yet to qapi, all tracepoints so far
>>>>
>>>> - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
>>>> finer granule by providing downtime measurements for each vmstate (I
>>>> made microsecond to be the unit to be accurate). So far it seems
>>>> iterable / non-iterable is the core of the problem, and I want to nail
>>>> it to per-device.
>>>>
>>>> - Trace dest QEMU too
>>>>
>>>> For the last bullet: consider the case where a device save() can be super
>>>> fast, while load() can actually be super slow. Both of them will
>>>> contribute to the ultimate downtime, but not a simple summary: when src
>>>> QEMU is save()ing on device1, dst QEMU can be load()ing on device2. So
>>>> they can run in parallel. However the only way to figure all components of
>>>> the downtime is to record both.
>>>>
>>>> Please have a look, thanks.
>>>>
>>>
>>> I like your series, as it allows a user to pinpoint one particular bad
>>> device,
>>> while covering the load side too. The checkpoints of migration on the other
>>> hand
>>> were useful -- while also a bit ugly -- for the sort of big picture of how
>>> downtime breaks down. Perhaps we could add that /also/ as tracepoitns
>>> without
>>> specifically commiting to be exposed in QAPI.
>>>
>>> More fundamentally, how can one capture the 'stop' part? There's also time
>>> spent
>>> there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
>>> device. All likely as bad to those tracepoints pertaining device-state/ram
>>> related stuff (iterable and non-iterable portions).
>>
>> Yeah that's a good point. I didn't cover "stop" yet because I think it's
>> just more tricky and I didn't think it all through, yet.
>>
It could follow your previous line of thought where you do it per vmstate.
But the catch is that vm state change handlers are nameless so tracepoints
wouldn't be tell which vm-state is spending time on each
>> The first question is, when stopping some backends, the vCPUs are still
>> running, so it's not 100% clear to me on which should be contributed as
>> part of real downtime.
>
> I was wrong.. we always stop vcpus first.
>
I was about to say this, but I guess you figured out. Even if your vCPUs weren't
stopped first, the external I/O threads (qemu or kernel) wouldn't service
guest own I/O which is a portion of outage.
> If you won't mind, I can add some traceopints for all those spots in this
> series to cover your other series. I'll also make sure I do that for both
> sides.
>
Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
considering it started on the other patches (if you also agree it is right). The
patches ofc are enterily different, but at least I like to believe the ideas
initially presented and then subsequently improved are what lead to the downtime
observability improvements in this series.
Joao
- [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- [PATCH 2/3] migration: Add migration_downtime_start|end() helpers, Peter Xu, 2023/10/26
- [PATCH 3/3] migration: Add per vmstate downtime tracepoints, Peter Xu, 2023/10/26
- [PATCH 1/3] migration: Set downtime_start even for postcopy, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints,
Joao Martins <=
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/26
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/27
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/30
- Re: [PATCH 0/3] migration: Downtime tracepoints, Peter Xu, 2023/10/30
- Re: [PATCH 0/3] migration: Downtime tracepoints, Joao Martins, 2023/10/30
[PATCH 4/3] migration: Add tracepoints for downtime checkpoints, Peter Xu, 2023/10/26