qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create


From: Isaku Yamahata
Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory
Date: Wed, 18 Jan 2023 02:17:23 -0800

On Wed, Jan 18, 2023 at 04:16:41PM +0800,
Chao Peng <chao.p.peng@linux.intel.com> wrote:

> On Tue, Jan 17, 2023 at 04:34:15PM +0000, Sean Christopherson wrote:
> > On Tue, Jan 17, 2023, Chao Peng wrote:
> > > On Fri, Jan 13, 2023 at 09:54:41PM +0000, Sean Christopherson wrote:
> > > > > +     list_for_each_entry(notifier, &data->notifiers, list) {
> > > > > +             notifier->ops->invalidate_start(notifier, start, end);
> > > > 
> > > > Two major design issues that we overlooked long ago:
> > > > 
> > > >   1. Blindly invoking notifiers will not scale.  E.g. if userspace 
> > > > configures a
> > > >      VM with a large number of convertible memslots that are all backed 
> > > > by a
> > > >      single large restrictedmem instance, then converting a single page 
> > > > will
> > > >      result in a linear walk through all memslots.  I don't expect 
> > > > anyone to
> > > >      actually do something silly like that, but I also never expected 
> > > > there to be
> > > >      a legitimate usecase for thousands of memslots.
> > > > 
> > > >   2. This approach fails to provide the ability for KVM to ensure a 
> > > > guest has
> > > >      exclusive access to a page.  As discussed in the past, the kernel 
> > > > can rely
> > > >      on hardware (and maybe ARM's pKVM implementation?) for those 
> > > > guarantees, but
> > > >      only for SNP and TDX VMs.  For VMs where userspace is trusted to 
> > > > some extent,
> > > >      e.g. SEV, there is value in ensuring a 1:1 association.
> > > > 
> > > >      And probably more importantly, relying on hardware for SNP and TDX 
> > > > yields a
> > > >      poor ABI and complicates KVM's internals.  If the kernel doesn't 
> > > > guarantee a
> > > >      page is exclusive to a guest, i.e. if userspace can hand out the 
> > > > same page
> > > >      from a restrictedmem instance to multiple VMs, then failure will 
> > > > occur only
> > > >      when KVM tries to assign the page to the second VM.  That will 
> > > > happen deep
> > > >      in KVM, which means KVM needs to gracefully handle such errors, 
> > > > and it means
> > > >      that KVM's ABI effectively allows plumbing garbage into its 
> > > > memslots.
> > > 
> > > It may not be a valid usage, but in my TDX environment I do meet below
> > > issue.
> > > 
> > > kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000 
> > > ua=0x7fe1ebfff000 ret=0
> > > kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0xffc00000 
> > > size=0x400000 ua=0x7fe271579000 ret=0
> > > kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x4 gpa=0xfeda0000 
> > > size=0x20000 ua=0x7fe1ec09f000 ret=-22
> > > 
> > > Slot#2('SMRAM') is actually an alias into system memory(Slot#0) in QEMU
> > > and slot#2 fails due to below exclusive check.
> > > 
> > > Currently I changed QEMU code to mark these alias slots as shared
> > > instead of private but I'm not 100% confident this is correct fix.
> > 
> > That's a QEMU bug of sorts.  SMM is mutually exclusive with TDX, QEMU 
> > shouldn't
> > be configuring SMRAM (or any SMM memslots for that matter) for TDX guests.
> 
> Thanks for the confirmation. As long as we only bind one notifier for
> each address, using xarray does make things simple.

In the past, I had patches for qemu to disable PAM and SMRAM, but they were
dropped for simplicity because SMRAM/PAM are disabled as reset state with unused
memslot registered. TDX guest bios(TDVF or EDK2) doesn't enable them.
Now we can revive them.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]