qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel patch cases qemu live migration failed.


From: Dave Martin
Subject: Re: Kernel patch cases qemu live migration failed.
Date: Mon, 19 Oct 2020 17:36:49 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Oct 19, 2020 at 05:23:11PM +0200, Andrew Jones wrote:
> On Mon, Oct 19, 2020 at 03:58:40PM +0100, Dave Martin wrote:
> > On Mon, Oct 19, 2020 at 03:18:11PM +0100, Peter Maydell wrote:
> > > On Mon, 19 Oct 2020 at 14:40, Andrew Jones <drjones@redhat.com> wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 12:43:33PM +0100, Peter Maydell wrote:
> > > > > Well, ID regs are special in the architecture -- they always exist
> > > > > and must RAZ/WI, even if they're not actually given any fields yet.
> > > > > This is different from other "unused" parts of the system register
> > > > > encoding space, which UNDEF.
> > > >
> > > > Table D12-2 confirms the register should be RAZ, as it says the register
> > > > is "RO, but RAZ if SVE is not implemented". Does "RO" imply "WI", 
> > > > though?
> > > > For the guest we inject an exception on writes, and for userspace we
> > > > require the value to be preserved on write.
> > > 
> > > Sorry, I mis-spoke. They're RAZ, but not WI, just RO (which is to say
> > > they'll UNDEF if you try to write to them).
> > > 
> > > > I think we should follow the spec, even for userspace access, and be RAZ
> > > > for when the feature isn't implemented. As for writes, assuming the
> > > > exception injection is what we want for the guest (not WI), then that's
> > > > correct. For userspace, I think we should continue forcing preservation
> > > > (which will force preservation of zero when it's RAZ).
> > > 
> > > Yes, that sounds right.
> > 
> > [...]
> > 
> > > > > The problem is that you've actually removed registers from
> > > > > the list that were previously in it (because pre-SVE
> > > > > kernels put this ID register in the list as a RAZ/WI register,
> > > > > and now it's not in the list if SVE isn't supported).
> > 
> > Define "previously", though.  IIUC, the full enumeration was added in
> > v4.15 (with ID_AA64ZFR0_EL1 still not supported at all):
> > 
> > v4.15-rc1~110^2~27
> > 93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features from 
> > guests")
> > 
> > 
> > And then ID_AA64FZR0_EL1 was removed from the enumeration, also in
> > v4.15:
> > 
> > v4.15-rc1~110^2~5
> > 07d79fe7c223 ("arm64/sve: KVM: Hide SVE from CPU features exposed to 
> > guests")
> > 
> > 
> > So, are there really two upstram kernel tags that are mismatched on
> > this, or is this just a bisectability issue in v4.14..v4.15?
> > 
> > It's a while since I looked at this, and I may have misunderstood the
> > timeline.
> > 
> > 
> > > > > > So, I think that instead of changing the ID_AA64ZFR0_EL1 behaviour,
> > > > > > parhaps we should move all ID_UNALLOCATED() regs (and possibly
> > > > > > ID_HIDDEN(), not sure about that) to have REG_HIDDEN_USER 
> > > > > > visibility.
> > > > >
> > > > > What does this do as far as the user-facing list-of-registers
> > > > > is concerned? All these registers need to remain in the
> > > > > KVM_GET_REG_LIST list, or you break migration from an old
> > > > > kernel to a new one.
> > 
> > OK, I think I see where you are coming from, now.
> > 
> > It may make sense to get rid of the REG_HIDDEN_GUEST / REG_HIDDEN_USER
> > distinction, and provide the same visibility for userspace as for MSR/
> > MRS all the time.  This would restore ID_AA64ZFR0_EL1 into the userspace
> > view, and may also allow a bit of simplification in the code.
> > 
> > Won't this will still break migration from the resulting kernel to a
> > current kernel that hides ID_AA64ZFR0_EL1?  Or have I misunderstood
> > something.
> >
> 
> Yes, but, while neither direction old -> new nor new -> old is actually
> something that people should do when using host cpu passthrough (they
> should only ever migrate between identical hosts, both hardware and
> host kernel version), migrating from old -> new makes more sense, as
> that's the upgrade path, and it's more supportable - we can workaround
> things on the new side. So, long story short, new -> old will fail due
> to making this change, but it's still probably the right thing to do,
> as we'll be defining a better pattern for ID registers going forward,
> and we never claimed new -> old migrations would work anyway with host
> passthrough.
> 
> Thanks,
> drew

Ack, just wanted to make sure I understood the implications correctly.

I'm still not sure I fully understand why we hit this problem (i.e.,
ZFR0 enumeration mismatch between host and guest) in the first place,
unless I've misunderstood which patches make these changes, or unless
RHEL has cherry-picked odd patches that weren't intended to be applied
separately...

Cheers
---Dave



reply via email to

[Prev in Thread] Current Thread [Next in Thread]