[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [sneak preview] major scsi overhaul

From: Paul Brook
Subject: Re: [Qemu-devel] [sneak preview] major scsi overhaul
Date: Wed, 11 Nov 2009 14:13:09 +0000
User-agent: KMail/1.12.2 (Linux/2.6.30-2-amd64; KDE/4.3.2; x86_64; ; )

> The current qemu code *does* cache the response.  scsi-disk caps the
> buffer at 128k (which is big enough for any request I've seen in my
> testing).  scsi-generic has no cap.

That cap is important.
For scsi-generic you probably don't have a choice because of the way the 
kernel interface works.
> With the new interface the HBA has to handle the caching if needed.  But
> the HBA also has the option to pass scatter lists, in which case qemu
> doesn't do any caching, the data is transfered directly from/to guest
> memory.  Which is clearly an improvement IMO.

> > Remember that a single
> > transfer can be very large (greater than available ram on the host). Even
> > with a SG capable HBA, you can not assume all the data will be
> > transferred in a single request.
> scsi-generic: It must be a single request anyway, and it already is today.
> scsi-disk: dma_bdrv_{read,write} will split it into smaller chunks if
> needed.

You seem to be assuming the HBA knows where it's going to put the data before 
it issues the command. This is not true (see blow).

> > You should also consider how this interacts with command queueing.
> > IIUC an Initiator (HBA) typically sends a tagged command, then
> > disconnects from the target (disk). At some later time the target
> > reconnects, and the initiator starts the DMA transfer. By my reading your
> > code does not issue any IO requests until after the HBA starts
> > transferring data.
> Hmm?  What code you are looking at?
> For esp and usb-storage reads and writes are handles slightly different.
>   They roughly works like this:

Neither ESP nor usb-storage implement command queueing, so aren't interesting.

> lsi (only one in-tree with TCQ support) works like this:
> - allocate + parse scsi command.        scsi_req_get+scsi_req_parse
> - continue script processing, collect
>    DMA addresses and stick them into
>    a scatter list until it is complete.
> - queue command and disconnect.
> - submit I/O to the qemu block layer    scsi_req_sgl
> *can process more scsi commands here*
> - when I/O is finished reselect tag
> - return status, release request.       scsi_req_put

I'm pretty sure this is wrong, and what actually happens is:

1) Wait for device to reconnect (goto 5), or commands from host (goto 2).

2) SCRIPTS connect to device, and send command.
3) If device has data immediately (metadata command) then goto 6
4) Device disconnects. goto 1

5) Device has data ready, and reconnects
6) SCRIPTS locate the next DMA block for this command, and initiate a (linear) 
DMA transfer.
7) DATA is transferred. Note that DMA stalls the SCRIPTS processor until the 
transfer completes.
8) If the device still has data then goto 6.
9) If the device runs out of data before the command completes then goto 3.
10) Command complete. goto 1

Note that the IO command is parsed at stage 2, but the data transfer is not 
requested until stage 6. i.e. after the command has partially completed. This 
window between issue and data transfer is where other commands are issued.

The only way to make your API work is to skip straight from step 3 to step 6, 
which effectively looses the command queueing capability. It may be that it's 
hard/impossible to get both command queueing and zero-copy. In that case I say 
command queueing wins.

Also note that use of self-modifying SCRIPTS is common.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]