qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] tpm: add backend for mssim


From: Stefan Berger
Subject: Re: [PATCH 2/2] tpm: add backend for mssim
Date: Mon, 9 Jan 2023 16:06:32 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0



On 1/9/23 14:01, Stefan Berger wrote:


On 1/9/23 13:51, James Bottomley wrote:
On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote:


On 1/9/23 12:55, James Bottomley wrote:
On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote:
* James Bottomley (jejb@linux.ibm.com) wrote:
[...]
external MSSIM TPM emulator has to be kept running to preserve the state. If you restart it, the migration will fail.

Document that and we're getting there.


The documentation in the current patch series says

---- The mssim backend supports snapshotting and migration,
but the state of the Microsoft Simulator server must be
preserved (or the server kept running) outside of QEMU for
restore to be successful. ----

What, beyond this would you want to see?

mssim today lacks the functionality of marshalling and unmarshalling the permanent and volatile state of the TPM 2, which are both needed for snapshot support. How does this work with mssim?

You preserve the state by keeping the simulator running as the above says. As long as you can preserve the state, there's no maximum time between snapshots. There's no need of marshal/unmarshal if you do this

From https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg03146.html

"VM snapshotting is basically VM suspend / resume on steroids requiring permanent and volatile state to be saved and restoreable from possible very different points in time with possibly different seeds, NVRAM locations etc. How the mssim protocol does this is non-obvious to me and how one coordinates the restoring and saving of the TPM's state without direct coordination by QEMU is also non-obvious."

One thing, though: I am aware of the issues that may arise due to
support for TPM state migration. However, whether TPM state migration becomes 
an issue
depends on how you use the TPM 2.

If the use case is to use the TPM 2 as a local crypto device then state 
migration
is  likely not an issue. You may have different keys in the TPM 2 at
different points in time and even snapshotting may not be an issue but possibly
quite a welcome feature to have along with support of scenarios of VM suspend + 
host
upgrade + host reboot + VM resume.

If you use TPM 2 for attestation then certain TPM 2 state migration scenarios
may become problematic. One could construct a scenario where attestation 
preceeds
some action that requires trust to have been established in the system in the
preceeding attestation step and support for snapshotting the state of the TPM 2
could become an issue if I was to wait for the attestation to have been 
concluded
and then I quickly restart a different snapshot that is not trustworthy and the 
client
proceeds thinking that the system is trustworthy (maybe a few SYNs from the 
client
went into the void)

Eliminating TPM 2 state migration is probably not a good idea, because  
environments
where attestation may occur may also support VM suspend/resume along  with 
upgrading
a host and rebooting the host or VM migration for some sort of host evacuation
before upgrade.


When it comes to snapshotting and using the TPM 2 as a crypto device just 
saying that
VM snapshot is supported by leaving the TPM 2 running and not touching it 
doesn't make
this function correctly for all scenarios where the TPM 2 may have had 
different keys
loaded. It is even a worse idea for attestation where I could construct a 
snapshot A
and wait until the attestation has passed and then resume with a snapshot A' 
that runs
untrustworty software but uses the state of the TPM 2 from snapshot A times and 
remains
happy to quote the state of the PCRs from before. If launching a snapshot also 
restores
the state of the PCRs that goes along with the state of the system at that time 
then
that at least allows for quotes to have valid contents of PCRs that reflects the
system state at snapshot A'.

Kexec also comes to mind in this context where I could quickly start a new 
system
post attestation. So physical system could possibly be used for fooling clients 
as well.

A solution for how to resolve this may involve some sort of protocol and a  
connection
that may not be broken *while* the system needs to be in a trusted state. The 
protocol
would have to help detection of substantial changes of state such as resume of 
some
snapshot or kexec into a system. Repeated attestation (with correctly restored 
TPM 2 state)
 may also help resolve the issue.

Cheers!
  Stefan






Stefan .

James






reply via email to

[Prev in Thread] Current Thread [Next in Thread]