[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
From: |
ronnie sahlberg |
Subject: |
Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI |
Date: |
Wed, 21 Sep 2011 19:48:22 +1000 |
Stefan,
Thanks for your review.
I feel that iscsi would be beneficial for several usecases.
I have implemented all changes you pointed out, except two, and resent
an updated patch to the list.
Please see below.
regards
ronnie sahlberg
On Mon, Sep 12, 2011 at 7:14 PM, Stefan Hajnoczi
<address@hidden> wrote:
> On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:
>
> Looking good. I think this is worth merging because it does offer
> benefits over host iSCSI.
>
>> +static void
>> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void
>> *command_data,
>> + void *private_data)
>> +{
>> +}
>> +
>> +static void
>> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> + IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
>> + IscsiLun *iscsilun = acb->iscsilun;
>> +
>> + acb->status = -ECANCELED;
>> + acb->common.cb(acb->common.opaque, acb->status);
>> + acb->canceled = 1;
>> +
>> + iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
>> + iscsi_abort_task_cb, NULL);
>> +}
>
> The asynchronous abort task call looks odd. If a caller allocates a
> buffer and issues a read request, then we need to make sure that the
> request is really aborted by the time .bdrv_aio_cancel() returns.
>
> If I understand the code correctly, iscsi_aio_cancel() returns
> immediately but the read request will still be in progress. That means
> the caller could now free their data buffer and the read request will
> overwrite that unallocated memory.
You are right.
I have added a new function to libiscsi that will also release the
task structure from libiscsi as well.
So we should no longer have a window where we might have a cancelled
task in flight writing the data to an already released buffer once the
cancelled data buffers start arriving on the socket.
>
>> +static void
>> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
>> + void *command_data, void *opaque)
>> +{
>> + IscsiAIOCB *acb = opaque;
>> +
>> + trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
>> +
>> + if (acb->buf != NULL) {
>> + free(acb->buf);
>> + }
>
> Please just use g_free(acb->buf). g_free(NULL) is defined as a nop so
> the check isn't necessary. Also, this code uses free(3) when it shoulde.
> use g_free(3).
>
Done.
>> +
>> + if (acb->canceled != 0) {
>> + qemu_aio_release(acb);
>> + scsi_free_scsi_task(acb->task);
>> + acb->task = NULL;
>> + return;
>> + }
>> +
>> + acb->status = 0;
>> + if (status < 0) {
>> + error_report("Failed to write10 data to iSCSI lun. %s",
>> + iscsi_get_error(iscsi));
>> + acb->status = -EIO;
>> + }
>> +
>> + iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
>> + scsi_free_scsi_task(acb->task);
>> + acb->task = NULL;
>> +}
>> +
>> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>> +{
>> + return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>> +}
>> +
>> +static BlockDriverAIOCB *
>> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
>> + QEMUIOVector *qiov, int nb_sectors,
>> + BlockDriverCompletionFunc *cb,
>> + void *opaque)
>> +{
>> + IscsiLun *iscsilun = bs->opaque;
>> + struct iscsi_context *iscsi = iscsilun->iscsi;
>> + IscsiAIOCB *acb;
>> + size_t size;
>> + int fua = 0;
>> +
>> + /* set FUA on writes when cache mode is write through */
>> + if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
>> + fua = 1;
>> + }
>
> FUA needs to reflect the guest semantics - does this disk have an
> enabled write cache? When bs->open_flags has BDRV_O_CACHE_WB, then the
> guest knows it needs to send flushes because there is a write cache:
>
> if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
> fua = 1;
> }
>
> BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint. It
> doesn't affect the cache semantics that the guest sees.
>
Done. When I first started building the patch a while ago, both flags
were needed. I missed when the second flag became obsolete upstream.
>> +/*
>> + * We support iscsi url's on the form
>> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> + */
>> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
>> +{
>> + IscsiLun *iscsilun = bs->opaque;
>> + struct iscsi_context *iscsi = NULL;
>> + struct iscsi_url *iscsi_url = NULL;
>> + struct IscsiTask task;
>> + int ret;
>> +
>> + if ((BDRV_SECTOR_SIZE % 512) != 0) {
>> + error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
>> + "BDRV_SECTOR_SIZE(%lld) is not a multiple "
>> + "of 512", BDRV_SECTOR_SIZE);
>> + return -EINVAL;
>> + }
>
> Another way of saying this is:
> QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);
>
> The advantage is that the build fails instead of waiting until iscsi is
> used at runtime until the failure is detected.
>
> What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?
>
I did not do this change.
If blocksize is not multiple of 512, the iscsi backend will just
refuse to work since having a read-modify-write cycle across the
network would be extremely costly.
I dont think iscsi should cause the entire build to fail.
While it is difficult to imagine a different blocksize, it is not
impossible I guess.
If someone comes up with a non-512-multiple blocksize and a good
reason for one, I dont want to be the guy that broke the build.
>> +
>> + memset(iscsilun, 0, sizeof(IscsiLun));
>> +
>> + /* Should really append the KVM name after the ':' here */
>> + iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
>> + if (iscsi == NULL) {
>> + error_report("iSCSI: Failed to create iSCSI context.");
>> + ret = -ENOMEM;
>> + goto failed;
>> + }
>> +
>> + iscsi_url = iscsi_parse_full_url(iscsi, filename);
>> + if (iscsi_url == NULL) {
>> + error_report("Failed to parse URL : %s %s", filename,
>> + iscsi_get_error(iscsi));
>> + ret = -ENOMEM;
>
> -EINVAL?
Done.
>
>> +static BlockDriver bdrv_iscsi = {
>> + .format_name = "iscsi",
>> + .protocol_name = "iscsi",
>> +
>> + .instance_size = sizeof(IscsiLun),
>> + .bdrv_file_open = iscsi_open,
>> + .bdrv_close = iscsi_close,
>> +
>> + .bdrv_getlength = iscsi_getlength,
>> +
>> + .bdrv_aio_readv = iscsi_aio_readv,
>> + .bdrv_aio_writev = iscsi_aio_writev,
>> + .bdrv_aio_flush = iscsi_aio_flush,
>
> block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
> implementation. I have sent a patch to add this emulation. This will
> turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).
I did not implement this since I understood from the discussion that a
patch is imminent and it is required for existing backend(s?) anyway.
>
>> diff --git a/trace-events b/trace-events
>> index 3fdd60f..4e461e5 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode
>> %2.2x"
>> escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>> escc_kbd_command(int val) "Command %d"
>> escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d
>> buttons=%01x"
>> +
>> +# block/iscsi.c
>> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors,
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque
>> %p acb %p"
>> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors,
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque
>> %p acb %p"
>
> It is no longer necessary to prefix trace event definitions with
> "disable".
Done.
>
> The stderr and simple backends now start up with all events disabled by
> default. The set of events can be set using the -trace
> events=<events-file> option or on the QEMU monitor using set trace-event
> <name> on.
>
> Stefan
>
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, (continued)
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Paolo Bonzini, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Orit Wasserman, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Paolo Bonzini, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Orit Wasserman, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Daniel P. Berrange, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Kevin Wolf, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Paolo Bonzini, 2011/09/15
- Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI,
ronnie sahlberg <=
Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI, Mark Wu, 2011/09/23
Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Kevin Wolf, 2011/09/12
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Orit Wasserman, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Christoph Hellwig, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Christoph Hellwig, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Stefan Hajnoczi, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Christoph Hellwig, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Stefan Hajnoczi, 2011/09/14
- Re: [Qemu-devel] [PATCH] Add iSCSI support for QEMU, Paolo Bonzini, 2011/09/14