qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption
Date: Fri, 21 Oct 2016 15:26:37 +0200

On Fri, 21 Oct 2016 09:53:43 -0200
Eduardo Habkost <address@hidden> wrote:

> On Fri, Oct 21, 2016 at 11:31:41AM +0200, Igor Mammedov wrote:
> > On Thu, 20 Oct 2016 14:56:58 -0200
> > Eduardo Habkost <address@hidden> wrote:
> >   
> > > On Thu, Oct 20, 2016 at 05:35:38PM +0200, Igor Mammedov wrote:  
> > > > On Thu, 20 Oct 2016 12:47:34 -0200
> > > > Eduardo Habkost <address@hidden> wrote:
> > > >     
> > > > > On Thu, Oct 20, 2016 at 04:15:21PM +0200, Igor Mammedov wrote:    
> > > > > > On Thu, 20 Oct 2016 11:56:10 -0200
> > > > > > Eduardo Habkost <address@hidden> wrote:
> > > > > >       
> > > > > > > On Thu, Oct 20, 2016 at 03:42:15PM +0200, Igor Mammedov wrote:    
> > > > > > >   
> > > > > > > > On Thu, 20 Oct 2016 21:11:38 +0800
> > > > > > > > Haozhong Zhang <address@hidden> wrote:
> > > > > > > >         
> > > > > > > > > On 10/20/16 14:34 +0200, Igor Mammedov wrote:        
> > > > > > > > > >On Thu, 20 Oct 2016 14:13:01 +0800
> > > > > > > > > >Haozhong Zhang <address@hidden> wrote:
> > > > > > > > > >          
> > > > > > > > > >> If a file is used as the backend of memory-backend-file 
> > > > > > > > > >> and its size is
> > > > > > > > > >> not identical to the property 'size', the file will be 
> > > > > > > > > >> truncated. For a
> > > > > > > > > >> file used as the backend of vNVDIMM, its data is expected 
> > > > > > > > > >> to be
> > > > > > > > > >> persistent and the truncation may corrupt the existing 
> > > > > > > > > >> data.          
> > > > > > > > > >I wonder if it's possible just skip 'size' property in your 
> > > > > > > > > >case instead
> > > > > > > > > >'notrunc' property. That way if size is not present one'd 
> > > > > > > > > >get actual size
> > > > > > > > > >using get_file_size() and set 'size' to it?
> > > > > > > > > >And if 'size' is provided and 'size' != file_size then error 
> > > > > > > > > >out.
> > > > > > > > > >          
> > > > > > > > > 
> > > > > > > > > I don't know how this can be implemented in QEMU. Specially, 
> > > > > > > > > how does
> > > > > > > > > the memory-backend-file know it's used for vNVDIMM, so that 
> > > > > > > > > it can
> > > > > > > > > skip the 'size' property?        
> > > > > > > > Does memory-backend-file needs to know that it's used by NVDIMM?
> > > > > > > > Looking at nvdimm_realize it doesn't as it's assumes 
> > > > > > > >   hostemem_size == pmem_size + label_size
> > > > > > > > 
> > > > > > > > I'd make hostmem_file.size optional and take size from file
> > > > > > > > and if 'size' is specified explictly require it to mach file 
> > > > > > > > size.
> > > > > > > > It's generic and has nothing to do with nvdimm.        
> > > > > > > 
> > > > > > > We can take size from file, or take size from the
> > > > > > > host_memory_backend_get_memory() callers.
> > > > > > > 
> > > > > > > Enumerating all sizes that QEMU can use as input:
> > > > > > > 
> > > > > > > A) Backend file size
> > > > > > > B) memory backend "size" option
> > > > > > > C) frontend-provided size (-numa size, -m, or pc-dimm "size"
> > > > > > >     property)      
> > > > > > -numa size affect only anon memory not backend backed one, for
> > > > > > backend baked memory we use memdev where size comes from backend
> > > > > > 
> > > > > > pc-dimm.size is readonly and isn't supposed to influence 
> > > > > > backend.size
> > > > > > 
> > > > > > I'd drop C option      
> > > > > 
> > > > > If C is not present, it should be, as it affects the guest ABI
> > > > > (and the ABI must never depend on the host you are running or
> > > > > backend configuration, only on the frontend configuration).    
> > > > I've meant that C should not affect behavior of backend.
> > > >     
> > > > > If we are dropping -numa size in favor of the
> > > > > memory-backend-provided size, that's a bug.    
> > > > -numa size is not applicable here as it's not using backends,
> > > > when backends are used it's -numa memdev instead in which case
> > > >    numa_info[nodenr].node_mem = object_property_get_int(o, "size", 
> > > > NULL);    
> > > 
> > > OK. My suggestion is to change this to not ignore the size
> > > option, but to validate it and/or do whatever necessary to get a
> > > MemoryRegion of the right size (your suggestion below to split
> > > the memory region would work too).
> > > 
> > > In other words, this should work:
> > > 
> > >   $ qemu -object 
> > > memory-backend-file,id=mem0,mem-path=/tmp/mempath,size=2G \
> > >          -numa node,size=2G,memdev=mem0 -m 2G
> > > 
> > > this must error out:
> > > 
> > >   $ qemu -object 
> > > memory-backend-file,id=mem0,mem-path=/tmp/mempath,size=2G \
> > >          -numa node,size=4G,memdev=mem0 -m 2G
> > > 
> > > and this should either error out, or result in a 1GB NUMA node
> > > (but never in a 2G NUMA node):
> > > 
> > >   $ qemu -object 
> > > memory-backend-file,id=mem0,mem-path=/tmp/mempath,size=2G \
> > >          -numa node,size=1G,memdev=mem0 -m 1G  
> > Looks like a bit of over-engineering and mixing together backend with 
> > frontend,
> > I don't see a compelling reason to support
> >   -numa size=X,memdev=foo
> > as just memdev=foo is sufficient
> > 
> > so I'd rather error out if user adds 'size' to -numa memdev=foo  
> 
> Erroring out would be good enough. I won't argue to allow people
> to use only a portion of the backend-provided memory unless we
> have a real use case for it.
> 
> > 
> > [...]
> >   
> > > Probably there's only one case where behavior would be different
> > > than what I was thinking. I would like to be possible to specify
> > > only C (frontend size), and omit both A and B. For example:
> > > 
> > >   $ mkdir /tmp/mempath
> > >   $ qemu -object memory-backend-file,id=mem0,mem-path=/tmp/mempath \
> > >          -numa node,size=1G,memdev=mem0 -m 1G
> > > 
> > > I would like this command to create a 1G file inside /tmp/mempath
> > > automatically, without the need for an explicit size=1G argument
> > > to memory-backend-file.  
> > This is still C where frontend manages allocation of backend
> > (i.e. changes backend's 'size' property) vs being just consumer
> > of whatever backend provides.  
> 
> I don't want it to change the 'size' property of the backend, I
> just want to allow the backend to allocate memory on demand in
> case size was not specified, to make the command-line interface
> simpler and less error-prone.
Allocation on demand is changing backend.size property and at that
possibly making frontend to dictate backing file size. And that's
brings more confusion since we would have user settable backend.size
and frontend.size.

That looks to me more error prone and more complicated.
Currently it's clear filename/size and memory region that
represents that file/size comes only from backend.
And in frontends (pc-dimm/nvdimm) size is what is mapped into
guest's GPA, which is full or subset of backends size depending
on frontend device and it's options (optional nvdimm.label-size for example)


> > Purpose of backend is to create complete/initialized hostside
> > object regardless of which frontend uses it.
> > I see all properties of backend (including sizes) as belonging
> > to hostside which shouldn't be influenced by whatever frontend
> > it's used. And these properties are specified/managed only by mgmt
> > side explicitly when creating a backend (either via CLI or monitor/QMP).  
> 
> I agree completely with the above paragraph, but we probably have
> different definitions of "complete/initialized". To me,
> "complete" doesn't need to include allocation (unless prealloc is
> required), and it's OK to let the backend allocate memory on
> demand to simplify the command-line interface.
Yep, I include allocation into definition of completeness.

Now lets imagine allocation on demand when frontend calls
backend.get_memory_region() callback and hotplug flow in case
of allocation failure:

# object_add memory-backend-file,id=mem0,mem-path=/tmp
# device_add pc-dimm,size=foo,memdev=mem0
 qemu would fail here and now pc-dimm code should take care
 of rolling back whatever changes it done to machine
 and then management side would need to delete backend
 as cleanup.

vs current way where object_add returned backend includes allocation:
# object_add memory-backend-file,id=mem0,mem-path=/tmp,size=foo
   fail here without need to do cleanups and not requiring from
   frontends to handle allocation failure.

Maybe I'm saying it badly but what I'm trying to say is that
 * backend should manage/acquire host resources
 * fontend should manage device representation to guest
pretty clear separation of roles

and I see setting host filenames, filesizes, mapping resources
into QEMU, ...  is backend's task.

Otherwise we could just merge frontends with backends and end up
with only one option instead of separate options for backend and
frontend creation, but then it would rather confusing what
properties a hostside vs guestside.


> > 
> > In your example I'd error out with:
> >   "memory-backend-file: requires 'size' to create file in /tmp/mempath"  
> 
> This is what should happen today, yes. But I would like to make
> 'size' optional later, to make the interface simpler.
I'm fine with 'size' being optional as far as it's
implicitly derived from backing file.


> Anyway, let's wait until I have an actual RFC (which may take a
> while as it's not high priority to me), otherwise it would be
> just a theoretical discussion.
> 
> In the meantime, we can change the code to error out on the
> examples I gave.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]