qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v4 3/3] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and P


From: Jonathan Cameron
Subject: Re: [RFC v4 3/3] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
Date: Tue, 3 Jan 2023 18:15:24 +0000

On Tue, 3 Jan 2023 11:02:31 -0500
Gregory Price <gourry.memverge@gmail.com> wrote:

> The fine grained control would be a precursor to an emulated pooling
> device.  If you can demonstrate it with a singleton attached device, you
> could just implement an exclusivity table in a shared file, and set the
> shared memory to a file backend as well.  Boom, shared memory pool across
> qemu instances.

For Dynamic capacity based pooling I agree, but a lot of work is needed to make 
that
function correctly.  The partitioning support is a much nearer term target and
there is no real need for them to look similar - on kernel side of things I'm
not yet convinced it's even a sensible route to make them look similar as DCD
is far less constrained than partitioning and expected usecases are probably
entirely different.

Hotplug based pooling (CXL 2.0 approach) is much simpler in OS because in that
case we always get full blown hotplug events.

Whilst I agree that pooling is interesting to emulate, my preference for initial
case would be moving between virtual PCIe heirarchies attached to different
root ports / host bridges on a single host.  That may be simpler to get going / 
test
than multiple hosts.

Right now I'd just like a static device with mixture of pmem / volatile plus 2+
HDM decoders and kernel support for that.

Jonathan

> 
> On Tue, Jan 3, 2023, 10:56 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> wrote:
> 
> > On Tue, 20 Dec 2022 14:27:31 -0500
> > Gregory Price <gregory.price@memverge.com> wrote:
> >  
> > > On Tue, Dec 20, 2022 at 03:34:53PM +0000, Jonathan Cameron wrote:  
> > > > > However I don't think this is successful in creating the dax devices,
> > > > > and therefore the reconfiguring into ram.  
> > > >
> > > > Sure. I only bothered testing the it in some dax modes rather than via  
> > kmem.  
> > > > It 'should' work but more testing needed there.
> > > >
> > > > However as you've noted, that only applies to the pmem regions at the  
> > moment.  
> > > > I wondered if you'd scripted the HDM decoder setup etc for test  
> > purposes  
> > > > (so what the driver will do). Alternative to that would be enabling  
> > the driver  
> > > > support. Not sure if anyone is looking at that yet. Final alternative  
> > would  
> > > > be to port the existing EDK2 based support to work on QEMU.  All non  
> > trivial  
> > > > jobs so may take a while,
> > > >
> > > > Jonathan  
> > >
> > > Also, I'm relatively new to this corner of the kernel (mm, regions, dax,
> > > etc), so i need to spend a week or two with uninterrupted tinkering with
> > > how adding new memory regions from these devices is actually "supposed
> > > to work" in a dynamic-capacity world.
> > >
> > > At least in theory, the partitioning of persistent and volatile memory
> > > regions on one of these type-3 devices should end up looking a bit like
> > > dynamic capacity when doing runtime reconfiguring.
> > >
> > > For example, considering
> > >
> > > Device(512mb PMEM, 512 VMEM), I'd want, at least i think
> > >
> > > CMFW-Volatile:    max window size(1024mb) - Numa 2
> > > CMFW-Persistent:  max window size(512mb)  - Numa 3
> > >
> > > Then we'd need the kernel support for
> > >
> > > 1) Online 2x256mb volatile regions in Numa 2
> > > 2) Online 2x256mb persistent regions in Numa 3
> > > 3) Offline persistent region (256mb:512mb)
> > > 4) Reconfigure device to 256Pmem/768Volatile
> > >    a) change decoders in device accordingly
> > > 5) Online 1x256mb volatile region in Numa 2
> > >
> > > The question is whether you can do this without offlining the other
> > > adjacent regions.  I just don't know enough about the region subsystem
> > > to say what is "correct" behavior here.  
> >
> > Whilst you probably 'can' do fine grained offline / online (to some
> > degree anyway) I'm not sure if people consider it an important
> > usecase. If decoder reprogramming is involved things will get very fiddly
> > so at least in first instance I'd advocate just ripping it all down and
> > building up again.  Or in the simple case, just block attempts to
> > reconfigure
> > at the partitioning if either side is in use.
> >  
> > >
> > > On the device side, I need to go look at the mailbox commands to go
> > > about implementing the reconfiguration / decoder reprogramming.
> > >
> > > I guess the "decoder" reprogramming is essentially changing the
> > > read/write commands to adjust based on v/pmem_active vs v/pmem_size?  
> >
> > Yup.  We also need multiple decoder support in general in QEMU.
> > It's not that high on my list as my main focus this cycle is going
> > to be on reducing the out of tree patch set by upstreaming stuff.
> >  
> > >
> > > I suppose I can look at this chunk next.  
> >
> > Great.
> >
> > Jonathan
> >
> >
> >  
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]