qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: Re: [PATCH 0/1] introduce nvmf block driver


From: zhenwei pi
Subject: Re: Re: Re: [PATCH 0/1] introduce nvmf block driver
Date: Tue, 8 Jun 2021 20:19:20 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 6/8/21 4:07 PM, Stefan Hajnoczi wrote:
On Tue, Jun 08, 2021 at 10:52:05AM +0800, zhenwei pi wrote:
On 6/7/21 11:08 PM, Stefan Hajnoczi wrote:
On Mon, Jun 07, 2021 at 09:32:52PM +0800, zhenwei pi wrote:
Since 2020, I started to develop a userspace NVMF initiator library:
https://github.com/bytedance/libnvmf
and released v0.1 recently.

Also developed block driver for QEMU side:
https://github.com/pizhenwei/qemu/tree/block-nvmf

Test with linux kernel NVMF target (TCP), QEMU gets about 220K IOPS,
it seems good.

How does the performance compare to the Linux kernel NVMeoF initiator?

In case you're interested, some Red Hat developers have started to
working on a new library called libblkio. For now it supports io_uring
but PCI NVMe and virtio-blk are on the roadmap. The library supports
blocking, event-driven, and polling modes. There isn't a direct overlap
with libnvmf but maybe they can learn from each other.
https://gitlab.com/libblkio/libblkio/-/blob/main/docs/blkio.rst

Stefan


I'm sorry about that no enough information of QEMU block nvmf driver and
libnvmf.

Kernel initiator & userspace initiator
Rather than io_uring/libaio + kernel initiator solution(read 500K+ IOPS &
write 200K+ IOPS), I prefer QEMU block nvmf + libnvmf(RW 200K+ IOPS):
1, I don't have to upgrade host kernel. I can also run it on a lower version
of kernel.
2, During re-connection if target side hits a panic, initiator side would
not get 'D' state(uninterruptable state in kernel), QEMU always could be
killed.
3, It's easier to trouble shoot for a userspace application.

I see, thanks for sharing.

Default NVMe-OF IO queues
The mechanism of QEMU+libnvmf:
1, QEMU iothread creates a request and dispatches it to NVMe-OF IO queues
thread by lockless list.
2, QEMU iothread tries to kick NVMe-OF IO queue thread.
3, NVMe-OF IO queue thread processes request and returns response to the
QEMU iothread.

When the QEMU iothread reaches the limitation, 4 NVMe-OF IO queues get
better performance.

Can you explain this bottleneck? Even with 4 NVMe-oF IO queues there is
still just 1 IOThread submitting requests, so why are 4 IO queues faster
than 1?

Stefan


QEMU + libiscsi solution uses iothread send/recv TCP and processes iSCSI PDU directly, it could get about 60K IOPS. Let's look at the perf report of the iothread:
+   35.06%      [k] entry_SYSCALL_64_after_hwframe
+   33.13%      [k] do_syscall_64
+   19.70%      [.] 0x0000000100000000
+   18.31%      [.] __libc_send
+   18.02%      [.] iscsi_tcp_service
+   16.30%      [k] __x64_sys_sendto
+   16.24%      [k] __sys_sendto
+   15.69%      [k] sock_sendmsg
+   15.56%      [k] tcp_sendmsg
+   14.25%      [k] __tcp_transmit_skb
+   13.94%      [k] 0x0000000000001000
+   13.78%      [k] tcp_sendmsg_locked
+   13.67%      [k] __ip_queue_xmit
+   13.00%      [k] tcp_write_xmit
+   12.07%      [k] __tcp_push_pending_frames
+   11.91%      [k] inet_recvmsg
+   11.78%      [k] tcp_recvmsg
+   11.73%      [k] ip_output

The bottleneck of this case is TCP, so libnvmf dispatches request to other threads by lockless list to reduce the overhead of TCP. It gets more effective to process requests from guest.


--
zhenwei pi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]