qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()


From: Klaus Jensen
Subject: Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Date: Tue, 20 Feb 2024 21:40:38 +0100

On Feb 21 00:33, Akihiko Odaki wrote:
> On 2024/02/20 23:53, Kevin Wolf wrote:
> > Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> > > Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > > > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > > > configurations to know the number of VFs being disabled due to SR-IOV
> > > > configuration writes, but the logic was flawed and resulted in
> > > > out-of-bound memory access.
> > > > 
> > > > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > > > VFs, but it actually doesn't in the following cases:
> > > > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > > > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > > > - VFs were only partially enabled because of realization failure.
> > > > 
> > > > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > > > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > > > provides, to get the number of enabled VFs before and after SR-IOV
> > > > configuration writes.
> > > > 
> > > > Cc: qemu-stable@nongnu.org
> > > > Fixes: CVE-2024-26328
> > > > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization 
> > > > Management command")
> > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > ---
> > > >   hw/nvme/ctrl.c | 26 ++++++++------------------
> > > >   1 file changed, 8 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > > index f026245d1e9e..7a56e7b79b4d 100644
> > > > --- a/hw/nvme/ctrl.c
> > > > +++ b/hw/nvme/ctrl.c
> > > > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> > > >       nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> > > >   }
> > > > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > > > -                                      uint32_t val, int len)
> > > > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t 
> > > > old_num_vfs)
> > > >   {
> > > >       NvmeCtrl *n = NVME(dev);
> > > >       NvmeSecCtrlEntry *sctrl;
> > > > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > > > -    uint32_t off = address - sriov_cap;
> > > > -    int i, num_vfs;
> > > > +    int i;
> > > > -    if (!sriov_cap) {
> > > > -        return;
> > > > -    }
> > > > -
> > > > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > > > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > > > -            num_vfs = pci_get_word(dev->config + sriov_cap + 
> > > > PCI_SRIOV_NUM_VF);
> > > > -            for (i = 0; i < num_vfs; i++) {
> > > > -                sctrl = &n->sec_ctrl_list.sec[i];
> > > > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), 
> > > > false);
> > > > -            }
> > > > -        }
> > > > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > > > +        sctrl = &n->sec_ctrl_list.sec[i];
> > > > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > > >       }
> > > >   }
> > > 
> > > Maybe I'm missing something, but if the concern is that 'i' could run
> > > beyond the end of the array, I don't see anything that limits
> > > pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> > > has. register_vfs() seems to just take whatever 16 bit value the guest
> > > wrote without imposing additional restrictions.
> > > 
> > > If there is some mechanism that makes register_vf() fail if we exceed
> > > the limit, maybe an assertion with a comment would be in order because
> > > it doesn't seem obvious. I couldn't find any code that enforces it,
> > > sriov_max_vfs only ever seems to be used as a hint for the guest.
> > > 
> > > If not, do we need another check that fails gracefully in the error
> > > case?
> > 
> > Ok, I see now that patch 2 fixes this. But then the commit message is
> > wrong because it implies that this patch is the only thing you need to
> > fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
> > of the fix is still missing.
> 
> I didn't assign CVE-2024-26328 for the case that the value of
> PCI_SRIOV_NUM_VF is greater than PCI_SRIOV_TOTAL_VF; it's what
> CVE-2024-26327 deals with.
> 
> The problem I dealt here is that the value of PCI_SRIOV_NUM_VF may not
> represent the actual number of enabled VFs because another register
> (PCI_SRIOV_CTRL_VFE) is not set, for example.
> 
> If an assertion to be added, I think it should be in pcie_sriov_num_vfs()
> and ensure the returning value is less than the value of PCI_SRIOV_TOTAL_VF
> (aka sriov_max_vfs in hw/nvme/ctrl.c), but I think it's fine without it.
> 
> > 
> > Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
> > good idea. But looking at this one, it seems to me that numcntl isn't
> > completely correct either:
> > 
> >      list->numcntl = cpu_to_le16(max_vfs);
> > 
> > Both list->numcntl and max_vfs are uint8_t, so I think this will always
> > be 0 on big endian hosts?
> 
> Indeed it looks wrong. Will you write a patch?
> 

I'll fix it. And give the SR-IOV parts of hw/nvme some love all around.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]