qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading prepa


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3
Date: Thu, 8 Aug 2019 10:02:13 +0100
User-agent: Mutt/1.12.0 (2019-05-25)

On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote:
> Kernel also serializes MAP/UNMAP on one inode. So you will need to run
> multiple jobs operating on different inodes to see parallel MAP/UNMAP
> (atleast from kernel's point of view).

Okay, there is still room to experiment with how MAP and UNMAP are
handled by virtiofsd and QEMU even if the host kernel ultimately becomes
the bottleneck.

One possible optimization is to eliminate REMOVEMAPPING requests when
the guest driver knows a SETUPMAPPING will follow immediately.  I see
the following request pattern in a fio randread iodepth=64 job:

  unique: 995348, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, pid: 1351
  lo_setupmapping(ino=135, fi=0x(nil), foffset=3860856832, len=2097152, 
moffset=859832320, flags=0)
     unique: 995348, success, outsize: 16
  unique: 995350, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, pid: 12
     unique: 995350, success, outsize: 16
  unique: 995352, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, pid: 1351
  lo_setupmapping(ino=135, fi=0x(nil), foffset=16777216, len=2097152, 
moffset=861929472, flags=0)
     unique: 995352, success, outsize: 16
  unique: 995354, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, pid: 12
     unique: 995354, success, outsize: 16
  virtio_send_msg: elem 9: with 1 in desc of length 16
  unique: 995356, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, pid: 1351
  lo_setupmapping(ino=135, fi=0x(nil), foffset=383778816, len=2097152, 
moffset=864026624, flags=0)
     unique: 995356, success, outsize: 16
  unique: 995358, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, pid: 12

The REMOVEMAPPING requests are unnecessary since we can map over the top
of the old mapping instead of taking the extra step of removing it
first.

Some more questions to consider for DAX performance optimization:

1. Is FUSE_READ/FUSE_WRITE more efficient than DAX for some I/O patterns?
2. Can MAP/UNMAP be performed directly in QEMU via a separate virtqueue?
3. Can READ/WRITE be performed directly in QEMU via a separate virtqueue
   to eliminate the bad address problem?
4. Can OPEN+MAP be fused into a single request for small files, avoiding
   the 2nd request?

I'm not going to tackle DAX optimization myself right now but wanted to
share these ideas.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]