qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [sneak preview] major scsi overhaul


From: Gerd Hoffmann
Subject: Re: [Qemu-devel] [sneak preview] major scsi overhaul
Date: Wed, 11 Nov 2009 10:41:25 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20091014 Fedora/3.0-2.8.b4.fc11 Lightning/1.0pre Thunderbird/3.0b4

On 11/11/09 05:06, Paul Brook wrote:
On Friday 06 November 2009, Gerd Hoffmann wrote:
    Hi,

http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6

What is in there?
   (3) New interface for HBA<=>  SCSIDevice interaction:
       * building on the new SCSIRequest struct.
       * the silly read_data/write_data ping-pong game is gone.
       * support for scatter lists and zerocopy I/O is there.

The "silly ping-pong game" is there for a reason.

Your code assumes that the whole transfer will be completed in a single
request.

... to the qemu block layer.  Yes.

The HBA <=> guest driver communication is a completely different story though.

This is not true for any of the current HBAs.  Expecting the HBA to
cache the entire response is simply not acceptable.

The current qemu code *does* cache the response. scsi-disk caps the buffer at 128k (which is big enough for any request I've seen in my testing). scsi-generic has no cap.

With the old interface scsi-disk and scsi-generic do the caching. Unconditionally.

With the new interface the HBA has to handle the caching if needed. But the HBA also has the option to pass scatter lists, in which case qemu doesn't do any caching, the data is transfered directly from/to guest memory. Which is clearly an improvement IMO.

Remember that a single
transfer can be very large (greater than available ram on the host). Even with
a SG capable HBA, you can not assume all the data will be transferred in a
single request.

scsi-generic: It must be a single request anyway, and it already is today.

scsi-disk: dma_bdrv_{read,write} will split it into smaller chunks if needed.

You should also consider how this interacts with command queueing.
IIUC an Initiator (HBA) typically sends a tagged command, then disconnects
from the target (disk). At some later time the target reconnects, and the
initiator starts the DMA transfer. By my reading your code does not issue any
IO requests until after the HBA starts transferring data.

Hmm?  What code you are looking at?

For esp and usb-storage reads and writes are handles slightly different. They roughly works like this:

read requests:
  - allocate + parse scsi command.      scsi_req_get+scsi_req_parse
  - submit I/O to qemu block layer.     scsi_req_buf
  - copy data do guest.
  - return status, release request      scsi_req_put

write requests:
  - allocate + parse scsi command.      scsi_req_get+scsi_req_parse
  - copy data from guest.
  - submit I/O to qemu block layer.     scsi_req_buf
  - return status, release request      scsi_req_put

Oh, and both do not support command queuing anyway.

lsi (only one in-tree with TCQ support) works like this:

- allocate + parse scsi command.        scsi_req_get+scsi_req_parse
- continue script processing, collect
  DMA addresses and stick them into
  a scatter list until it is complete.
- queue command and disconnect.
- submit I/O to the qemu block layer    scsi_req_sgl

*can process more scsi commands here*

- when I/O is finished reselect tag
- return status, release request.       scsi_req_put

[ Yes, this should go to the changelog.  As mentioned in the
  announcement the commit comments need some work ... ]

The only way to
achieve this is for the target to pretend it has data available immediately,
at which point the transfer stalls and we loose the opportunity for parallel
command queueing.

Note that command parsing and I/O submitting is separate now. So the HBA knows how much data is going to be transfered by the command before actually submitting the I/O.

cheers,
  Gerd





reply via email to

[Prev in Thread] Current Thread [Next in Thread]