qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/8] ioregionfd introduction


From: Stefan Hajnoczi
Subject: Re: [RFC 0/8] ioregionfd introduction
Date: Wed, 16 Feb 2022 11:20:28 +0000

On Tue, Feb 15, 2022 at 10:16:04AM -0800, Elena wrote:
> On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > > This patchset is an RFC version for the ioregionfd implementation
> > > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> > > 
> > > For this implementation version 3 of the posted kernel patches was user:
> > > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> > > 
> > > The future version will include support for vfio/libvfio-user.
> > > Please refer to the design discussion here proposed by Stefan:
> > > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> > > 
> > > The vfio-user version needed some bug-fixing and it was decided to send
> > > this for multiprocess first.
> > > 
> > > The ioregionfd is configured currently trough the command line and each
> > > ioregionfd represent an object. This allow for easy parsing and does
> > > not require device/remote object command line option modifications.
> > > 
> > > The following command line can be used to specify ioregionfd:
> > > <snip>
> > >   '-object', 
> > > 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> > >   '-object', 
> > > 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> > >   '-object', 
> > > 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
> > 
> 
> Hi Stefan
> 
> Thank you for taking a look!
> 
> > Explicit configuration of ioregionfd-object is okay for early
> > prototyping, but what is the plan for integrating this? I guess
> > x-remote-object would query the remote device to find out which
> > ioregionfds need to be registered and the user wouldn't need to specify
> > ioregionfds on the command-line?
> 
> Yes, this can be done. For some reason I thought that user will be able
> to configure the number/size of the regions to be configured as
> ioregionfds. 
> 
> > 
> > > </snip>
> > > 
> > > Proxy side of ioregionfd in this version uses only one file descriptor:
> > > <snip>
> > >   '-device', 
> > > 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),
> > >  \
> > > </snip>
> > 
> > This raises the question of the ioregionfd file descriptor lifecycle. In
> > the end I think it shouldn't be specified on the command-line. Instead
> > the remote device should create it and pass it to QEMU over the
> > mpqemu/remote fd?
> 
> Yes, this will be same as vfio-user does.
> > 
> > > 
> > > This is done for RFC version and my though was that next version will
> > > be for vfio-user, so I have not dedicated much effort to this command
> > > line options.
> > > 
> > > The multiprocess messaging protocol was extended to support inquiries
> > > by the proxy if device has any ioregionfds.
> > > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > > or not) and the type of it (memory/io).
> > > 
> > > Currently there are few limitations in this version of ioregionfd.
> > >  - one ioregionfd per bar, only full bar size is supported;
> > >  - one file descriptor per device for all of its ioregionfds;
> > >  - each remote device runs fd handler for all its BARs in one IOThread;
> > >  - proxy supports only one fd.
> > > 
> > > Some of these limitations will be dropped in the future version.
> > > This RFC is to acquire the feedback/suggestions from the community
> > > on the general approach.
> > > 
> > > The quick performance test was done for the remote lsi device with
> > > ioregionfd and without for both mem BARs (1 and 2) with help
> > > of the fio tool:
> > > 
> > > Random R/W:
> > > 
> > >                read IOPS  read BW     write IOPS   write BW
> > > no ioregionfd      889        3559KiB/s   890          3561KiB/s
> > > ioregionfd             938            3756KiB/s   939          3757KiB/s
> > 
> > This is extremely slow, even for random I/O. How does this compare to
> > QEMU running the LSI device without multi-process mode?
> 
> These tests had the iodepth=256. I have changed this to 1 and tested
> without multiprocess, with multiprocess and multiprocess with both mmio
> regions as ioregionfds:
> 
>                        read IOPS  read BW(KiB/s)  write IOPS   write BW 
> (KiB/s)
> no multiprocess             89                 358           90           360
> multiprocess                138                556           139          557
> multiprocess ioregionfd           174          698           173          693
> 
> The fio config for randomrw:
> [global]
> bs=4K
> iodepth=1
> direct=0

Please set direct=1 so the guest page cache does not affect the I/O
pattern.

The host --drive option also needs cache.direct=on to avoid host page
cache effects.

The reason for benchmarking with direct=1 is to ensure that every I/O
request submitted by fio is forwarded to the underlying disk. Otherwise
the benchmark may be comparing guest page cache or host page cache hits,
which do not involve the disk.

Page cache read-ahead and write-behind may involve large block sizes and
therefore change the I/O pattern specified on the fio command-line. This
interferes with the benchmark and is another reason to use direct=1.

> ioengine=libaio
> group_reporting
> time_based
> runtime=240
> numjobs=1
> name=raw-randreadwrite
> rw=randrw
> size=8G
> [job1]
> filename=/fio/randomrw
> 
> And QEMU command line for non-mutliprocess:
> 
> /usr/local/bin/qemu-system-x86_64  -name "OL7.4" -machine q35,accel=kvm -smp 
> sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d 
> -vnc :0 -chardev stdio,id=seabios -device 
> isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive 
> id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device 
> scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0
> 
> QEMU command line for multiprocess:
> 
> remote_cmd = [ PROC_QEMU,                                                     
>  \
>                '-machine', 'x-remote',                                        
>  \
>                '-device', 'lsi53c895a,id=lsi0',                               
>  \
>                '-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2',   \
>                '-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,'  
>  \
>                               'scsi-id=0',                                    
>  \
>                '-nographic',                                                  
>  \
>                '-monitor', 'unix:/home/homedir/rem-sock,server,nowait',       
>          \
>                '-object', 
> 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
>                '-object', 
> 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\
>                '-object', 
> 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
>                ]
> proxy_cmd = [ PROC_QEMU,                                           \
>               '-D', '/tmp/qemu-debug-log', \
>               '-name', 'OL7.4',                                               
>  \
>               '-machine', 'pc,accel=kvm',                                     
>  \
>               '-smp', 'sockets=1,cores=2,threads=2',                          
>  \
>               '-m', '2048',                                                   
>  \
>               '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',       
>  \
>               '-numa', 'node,memdev=sysmem-file',                             
>  \
>               '-hda','/home/homedir/ol7u9boot.img',                      \
>               '-boot', 'd',                                                   
>  \
>               '-vnc', ':0',                                                   
>  \
>               '-device', 
> 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),
>                \
>               '-monitor', 'unix:/home/homedir/qemu-sock,server,nowait',       
>          \
>               '-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', 
> '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\
>             ]
> 
> Where for the test without ioregionfds, they are commented out.
> 
> I am doing more testing as I see some inconsistent results.

Thanks for the benchmark details!

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]