Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE

From:	Andy Lutomirski
Subject:	Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE
Date:	Fri, 4 Mar 2022 11:24:30 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 2/23/22 04:05, Steven Price wrote:

On 23/02/2022 11:49, Chao Peng wrote:

On Thu, Feb 17, 2022 at 11:09:35AM -0800, Andy Lutomirski wrote:

On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote:

On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote:

On 1/18/22 05:21, Chao Peng wrote:

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of
the file is inaccessible from userspace through ordinary MMU access
(e.g., read/write/mmap). However, the file content can be accessed
via a different mechanism (e.g. KVM MMU) indirectly.

It provides semantics required for KVM guest private memory support
that a file descriptor with this seal set is going to be used as the
source of guest memory in confidential computing environments such
as Intel TDX/AMD SEV but may not be accessible from host userspace.

At this time only shmem implements this seal.


I don't dislike this *that* much, but I do dislike this. F_SEAL_INACCESSIBLE
essentially transmutes a memfd into a different type of object.  While this
can apparently be done successfully and without races (as in this code),
it's at least awkward.  I think that either creating a special inaccessible
memfd should be a single operation that create the correct type of object or
there should be a clear justification for why it's a two-step process.


Now one justification maybe from Stever's comment to patch-00: for ARM
usage it can be used with creating a normal memfd, (partially)populate
it with initial guest memory content (e.g. firmware), and then
F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in
KVM (definitely the current code needs to be changed to support that).


Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right?  So this 
won't work.


Hmm, right, if we set F_SEAL_INACCESSIBLE on a non-empty file, we will
need to make sure access to existing mmap-ed area should be prevented,
but that is hard.


In any case, the whole confidential VM initialization story is a bit buddy.  
From the earlier emails, it sounds like ARM expects the host to fill in guest 
memory and measure it.  From my recollection of Intel's scheme (which may well 
be wrong, and I could easily be confusing it with SGX), TDX instead measures 
what is essentially a transcript of the series of operations that initializes 
the VM.  These are fundamentally not the same thing even if they accomplish the 
same end goal.  For TDX, we unavoidably need an operation (ioctl or similar) 
that initializes things according to the VM's instructions, and ARM ought to be 
able to use roughly the same mechanism.


Yes, TDX requires a ioctl. Steven may comment on the ARM part.


The Arm story is evolving so I can't give a definite answer yet. Our
current prototyping works by creating the initial VM content in a
memslot as with a normal VM and then calling an ioctl which throws the
big switch and converts all the (populated) pages to be protected. At
this point the RMM performs a measurement of the data that the VM is
being populated with.

The above (in our prototype) suffers from all the expected problems with
a malicious VMM being able to trick the host kernel into accessing those
pages after they have been protected (causing a fault detected by the
hardware).

The ideal (from our perspective) approach would be to follow the same
flow but where the VMM populates a memfd rather than normal anonymous
pages. The memfd could then be sealed and the pages converted to
protected ones (with the RMM measuring them in the process).

The question becomes how is that memfd populated? It would be nice if
that could be done using normal operations on a memfd (i.e. using
mmap()) and therefore this code could be (relatively) portable. This
would mean that any pages mapped from the memfd would either need to
block the sealing or be revoked at the time of sealing.

The other approach is we could of course implement a special ioctl which
effectively does a memcpy into the (created empty and sealed) memfd and
does the necessary dance with the RMM to measure the contents. This
would match the "transcript of the series of operations" described above
- but seems much less ideal from the viewpoint of the VMM.

A VMM that supports Other Vendors will need to understand this sort ofmodel regardless.

I don't particularly mind the idea of having the kernel consume a normalmemfd and spit out a new object, but I find the concept of changing thetype of the object in place, even if it has other references, and tryingto control all the resulting races to be somewhat alarming.


In pseudo-Rust, this is the difference between:

fn convert_to_private(in: &mut Memfd)

and

fn convert_to_private(in: Memfd) -> PrivateMemoryFd

This doesn't map particularly nicely to the kernel, though.

--Andy\

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE, Andy Lutomirski <=
- Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE, Chao Peng, 2022/03/07
  - Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE, Paolo Bonzini, 2022/03/08

Prev by Date: Re: [PULL 00/11] QEMU changes for 2021-03-02
Next by Date: Re: [PATCH] hw/block: m25p80: Add support for w25q01jvq
Previous by thread: [PULL v2 00/21] tcg patch queue
Next by thread: Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE
Index(es):
- Date
- Thread