|
From: | Stefan Berger |
Subject: | Re: [PATCH 2/2] tpm: add backend for mssim |
Date: | Mon, 9 Jan 2023 16:06:32 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 |
On 1/9/23 14:01, Stefan Berger wrote:
On 1/9/23 13:51, James Bottomley wrote:On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote:On 1/9/23 12:55, James Bottomley wrote:On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote:* James Bottomley (jejb@linux.ibm.com) wrote:[...]external MSSIM TPM emulator has to be kept running to preserve the state. If you restart it, the migration will fail.Document that and we're getting there.The documentation in the current patch series says ---- The mssim backend supports snapshotting and migration, but the state of the Microsoft Simulator server must be preserved (or the server kept running) outside of QEMU for restore to be successful. ---- What, beyond this would you want to see?mssim today lacks the functionality of marshalling and unmarshalling the permanent and volatile state of the TPM 2, which are both needed for snapshot support. How does this work with mssim?You preserve the state by keeping the simulator running as the above says. As long as you can preserve the state, there's no maximum time between snapshots. There's no need of marshal/unmarshal if you do thisFrom https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg03146.html"VM snapshotting is basically VM suspend / resume on steroids requiring permanent and volatile state to be saved and restoreable from possible very different points in time with possibly different seeds, NVRAM locations etc. How the mssim protocol does this is non-obvious to me and how one coordinates the restoring and saving of the TPM's state without direct coordination by QEMU is also non-obvious."
One thing, though: I am aware of the issues that may arise due to support for TPM state migration. However, whether TPM state migration becomes an issue depends on how you use the TPM 2. If the use case is to use the TPM 2 as a local crypto device then state migration is likely not an issue. You may have different keys in the TPM 2 at different points in time and even snapshotting may not be an issue but possibly quite a welcome feature to have along with support of scenarios of VM suspend + host upgrade + host reboot + VM resume. If you use TPM 2 for attestation then certain TPM 2 state migration scenarios may become problematic. One could construct a scenario where attestation preceeds some action that requires trust to have been established in the system in the preceeding attestation step and support for snapshotting the state of the TPM 2 could become an issue if I was to wait for the attestation to have been concluded and then I quickly restart a different snapshot that is not trustworthy and the client proceeds thinking that the system is trustworthy (maybe a few SYNs from the client went into the void) Eliminating TPM 2 state migration is probably not a good idea, because environments where attestation may occur may also support VM suspend/resume along with upgrading a host and rebooting the host or VM migration for some sort of host evacuation before upgrade. When it comes to snapshotting and using the TPM 2 as a crypto device just saying that VM snapshot is supported by leaving the TPM 2 running and not touching it doesn't make this function correctly for all scenarios where the TPM 2 may have had different keys loaded. It is even a worse idea for attestation where I could construct a snapshot A and wait until the attestation has passed and then resume with a snapshot A' that runs untrustworty software but uses the state of the TPM 2 from snapshot A times and remains happy to quote the state of the PCRs from before. If launching a snapshot also restores the state of the PCRs that goes along with the state of the system at that time then that at least allows for quotes to have valid contents of PCRs that reflects the system state at snapshot A'. Kexec also comes to mind in this context where I could quickly start a new system post attestation. So physical system could possibly be used for fooling clients as well. A solution for how to resolve this may involve some sort of protocol and a connection that may not be broken *while* the system needs to be in a trusted state. The protocol would have to help detection of substantial changes of state such as resume of some snapshot or kexec into a system. Repeated attestation (with correctly restored TPM 2 state) may also help resolve the issue. Cheers! Stefan
Stefan .James
[Prev in Thread] | Current Thread | [Next in Thread] |