qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
Date: Mon, 12 Sep 2011 10:14:08 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:

Looking good.  I think this is worth merging because it does offer
benefits over host iSCSI.

> +static void
> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
> *command_data,
> +                    void *private_data)
> +{
> +}
> +
> +static void
> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
> +{
> +    IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
> +    IscsiLun *iscsilun = acb->iscsilun;
> +
> +    acb->status = -ECANCELED;
> +    acb->common.cb(acb->common.opaque, acb->status);
> +    acb->canceled = 1;
> +
> +    iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
> +                                     iscsi_abort_task_cb, NULL);
> +}

The asynchronous abort task call looks odd.  If a caller allocates a
buffer and issues a read request, then we need to make sure that the
request is really aborted by the time .bdrv_aio_cancel() returns.

If I understand the code correctly, iscsi_aio_cancel() returns
immediately but the read request will still be in progress.  That means
the caller could now free their data buffer and the read request will
overwrite that unallocated memory.

> +static void
> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
> +                     void *command_data, void *opaque)
> +{
> +    IscsiAIOCB *acb = opaque;
> +
> +    trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
> +
> +    if (acb->buf != NULL) {
> +        free(acb->buf);
> +    }

Please just use g_free(acb->buf).  g_free(NULL) is defined as a nop so
the check isn't necessary.  Also, this code uses free(3) when it should
use g_free(3).

> +
> +    if (acb->canceled != 0) {
> +        qemu_aio_release(acb);
> +        scsi_free_scsi_task(acb->task);
> +        acb->task = NULL;
> +        return;
> +    }
> +
> +    acb->status = 0;
> +    if (status < 0) {
> +        error_report("Failed to write10 data to iSCSI lun. %s",
> +                     iscsi_get_error(iscsi));
> +        acb->status = -EIO;
> +    }
> +
> +    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
> +    scsi_free_scsi_task(acb->task);
> +    acb->task = NULL;
> +}
> +
> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
> +{
> +    return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
> +}
> +
> +static BlockDriverAIOCB *
> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
> +                 QEMUIOVector *qiov, int nb_sectors,
> +                 BlockDriverCompletionFunc *cb,
> +                 void *opaque)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct iscsi_context *iscsi = iscsilun->iscsi;
> +    IscsiAIOCB *acb;
> +    size_t size;
> +    int fua = 0;
> +
> +    /* set FUA on writes when cache mode is write through */
> +    if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
> +        fua = 1;
> +    }

FUA needs to reflect the guest semantics - does this disk have an
enabled write cache?  When bs->open_flags has BDRV_O_CACHE_WB, then the
guest knows it needs to send flushes because there is a write cache:

if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
    fua = 1;
}

BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint.  It
doesn't affect the cache semantics that the guest sees.

> +/*
> + * We support iscsi url's on the form
> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
> + */
> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct iscsi_context *iscsi = NULL;
> +    struct iscsi_url *iscsi_url = NULL;
> +    struct IscsiTask task;
> +    int ret;
> +
> +    if ((BDRV_SECTOR_SIZE % 512) != 0) {
> +        error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
> +                     "BDRV_SECTOR_SIZE(%lld) is not a multiple "
> +                     "of 512", BDRV_SECTOR_SIZE);
> +        return -EINVAL;
> +    }

Another way of saying this is:
QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);

The advantage is that the build fails instead of waiting until iscsi is
used at runtime until the failure is detected.

What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?

> +
> +    memset(iscsilun, 0, sizeof(IscsiLun));
> +
> +    /* Should really append the KVM name after the ':' here */
> +    iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
> +    if (iscsi == NULL) {
> +        error_report("iSCSI: Failed to create iSCSI context.");
> +        ret = -ENOMEM;
> +        goto failed;
> +    }
> +
> +    iscsi_url = iscsi_parse_full_url(iscsi, filename);
> +    if (iscsi_url == NULL) {
> +        error_report("Failed to parse URL : %s %s", filename,
> +                     iscsi_get_error(iscsi));
> +        ret = -ENOMEM;

-EINVAL?

> +static BlockDriver bdrv_iscsi = {
> +    .format_name     = "iscsi",
> +    .protocol_name   = "iscsi",
> +
> +    .instance_size   = sizeof(IscsiLun),
> +    .bdrv_file_open  = iscsi_open,
> +    .bdrv_close      = iscsi_close,
> +
> +    .bdrv_getlength  = iscsi_getlength,
> +
> +    .bdrv_aio_readv  = iscsi_aio_readv,
> +    .bdrv_aio_writev = iscsi_aio_writev,
> +    .bdrv_aio_flush  = iscsi_aio_flush,

block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
implementation.  I have sent a patch to add this emulation.  This will
turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).

> diff --git a/trace-events b/trace-events
> index 3fdd60f..4e461e5 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode %2.2x"
>  escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>  escc_kbd_command(int val) "Command %d"
>  escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d 
> buttons=%01x"
> +
> +# block/iscsi.c
> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int 
> canceled) "iscsi %p status %d acb %p canceled %d"
> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, 
> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
> %p acb %p"
> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int 
> canceled) "iscsi %p status %d acb %p canceled %d"
> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, 
> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
> %p acb %p"

It is no longer necessary to prefix trace event definitions with
"disable".

The stderr and simple backends now start up with all events disabled by
default.  The set of events can be set using the -trace
events=<events-file> option or on the QEMU monitor using set trace-event
<name> on.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]