qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] HLFS driver for QEMU


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH] HLFS driver for QEMU
Date: Mon, 18 Mar 2013 12:10:28 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Mar 18, 2013 at 02:47:44PM +0800, address@hidden wrote:
> From: Harry Wei <address@hidden>
> 
> HLFS is HDFS-based(Hadoop Distributed File System) Log-Structured File
> System. Actually, HLFS, currently, is not a FS but a block-storage system,
> which we simplify LFS to fit block-level storage. So you could also call HLFS
> as HLBS (HDFS-based Log-Structured Block-storage System).  HLFS has
> two mode, which are local mode and HDFS mode. HDFS is once write and
> many read so HLFS realize LBS(Log-Structured Block-storage System) to
> achieve reading and writing randomly. LBS is based on LFS's basic theories
> but is different from LFS, which LBS fits block storage better. See
> http://code.google.com/p/cloudxy/wiki/WHAT_IS_CLOUDXY
> about HLFS in details.
> 
> Currently, HLFS support following features:
> 
> 1, Portions of POSIX --- Just realize some interfaces VM image need.
> 2, Randomly Read/Write.
> 3, Large file storage (TB).
> 4, Support snapshots(Linear snapshots and tree snapshots), Clone,
> Block compression, Cache, etc.
> 5, A copy of the data more.
> 6, Cluster system can dynamic expand.
> ...
> 
> Signed-off-by: Harry Wei <address@hidden>

Is HLFS making releases that distros can package?  I don't see packages
in Debian or Fedora.

Block drivers in qemu.git should have active and sustainable communities
behind them.  For example, if this is a research project where the team
will move on within a year, then it may not be appropriate to merge the
code into QEMU.  Can you share some background on the HLFS community and
where the project is heading?

> ---
>  block/Makefile.objs |    2 +-
>  block/hlfs.c        |  515 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure           |   51 +++++
>  3 files changed, 567 insertions(+), 1 deletion(-)
>  create mode 100644 block/hlfs.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index c067f38..723c7a5 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -8,7 +8,7 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  
>  ifeq ($(CONFIG_POSIX),y)
> -block-obj-y += nbd.o sheepdog.o
> +block-obj-y += nbd.o sheepdog.o hlfs.o

Missing CONFIG_HLFS to compile out hlfs.o.

> +static int hlbs_write(BlockDriverState *bs, int64_t sector_num, const uint8_t
> +        *buf, int nb_sectors)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    uint32_t write_size = nb_sectors * 512;
> +    uint64_t offset = sector_num * 512;
> +    ret = hlfs_write(s->hctrl, (char *)buf, write_size, offset);
> +    if (ret != write_size) {
> +        return -EIO;
> +    }
> +    return 0;
> +}
> +
> +static int hlbs_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
> +        int nb_sectors)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    uint32_t read_size = nb_sectors * 512;
> +    uint64_t offset = sector_num * 512;
> +    ret = hlfs_read(s->hctrl, (char *)buf, read_size, offset);
> +    if (ret != read_size) {
> +        return -EIO;
> +    }
> +    return 0;
> +}
> +
> +static int hlbs_flush(BlockDriverState *bs)
> +{
> +    int ret;
> +    BDRVHLBSState *s = bs->opaque;
> +    ret = hlfs_flush(s->hctrl);
> +    return ret;
> +}

read/write/flush should be either .bdrv_co_* or .bdrv_aio_*.

The current code pauses the guest while I/O is in progress!  Try running
disk I/O benchmarks inside the guest and you'll see that performance and
interactivity are poor.

QEMU has two ways of implementing efficient block drivers:

1. Coroutines - .bdrv_co_*

Good for image formats or block drivers that have internal logic.

Each request runs inside a coroutine - see include/block/coroutine.h.
In order to wait for I/O, submit an asynchronous request or worker
thread and yield the coroutine.  When the request completes, re-enter
the coroutine and continue.

Examples: block/sheepdog.c or block/qcow2.c.

2. Asynchronous I/O - .bdrv_aio_*

Good for low-level block drivers that have little logic.

The request processing code is split into callbacks.  I/O requests are
submitted and then the code returns back to QEMU's main loop.  When the
I/O request completes, a callback is invoked.

Examples: block/rbd.c or block/qed.c.

> +static int hlbs_snapshot_delete(BlockDriverState *bs, const char *snapshot)
> +{
> +    /* FIXME: Delete specified snapshot id.  */

Outdated comment?

> +    BDRVHLBSState *s = bs->opaque;
> +    int ret = 0;
> +    ret = hlfs_rm_snapshot(s->hctrl->storage->uri, snapshot);
> +    return ret;
> +}
> +
> +static int hlbs_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo 
> **psn_tab)
> +{
> +    BDRVHLBSState *s = bs->opaque;
> +    /*Fixed: num_entries must be inited 0*/

Comment can be dropped?

> +    int num_entries = 0;
> +    struct snapshot *snapshots = 
> hlfs_get_all_snapshots(s->hctrl->storage->uri,

I suggest using "hlfs_" as a prefix for libhlfs structs.  Otherwise you
risk name collisions.

> +        &num_entries);
> +    dprintf("snapshot count:%d\n", num_entries);
> +    /*Fixed: snapshots is NULL condition*/

Comment can be dropped?

> +    if (NULL == snapshots) {
> +        dprintf("snapshots is NULL, may be no snapshots, check it 
> please!\n");
> +        return num_entries;
> +    }
> +    QEMUSnapshotInfo *sn_tab = NULL;
> +    struct snapshot *snapshot = snapshots;
> +    sn_tab = g_malloc0(num_entries * sizeof(*sn_tab));
> +    int i;
> +    for (i = 0; i < num_entries; i++) {
> +        printf("---snapshot:%s----", snapshot->sname);

Debugging code?

> +        sn_tab[i].date_sec = snapshot->timestamp * 1000;
> +        sn_tab[i].date_nsec = 0;
> +        sn_tab[i].vm_state_size = 0;
> +        sn_tab[i].vm_clock_nsec = 0;
> +        snprintf(sn_tab[i].id_str, sizeof(sn_tab[i].id_str), "%u", 0);
> +        strncpy(sn_tab[i].name, snapshot->sname, MIN(sizeof(sn_tab[i].name),
> +                    strlen(snapshot->sname)));

Please use snprintf() instead.  strncpy() does not NUL-terminate when
strlen(input) >= bufsize.

> +        snapshot++;
> +    }
> +    *psn_tab = sn_tab;
> +    if (snapshots != NULL) {
> +        g_free(snapshots);
> +    }

g_free(NULL) is a nop, the if is not needed.

> +    return num_entries;
> +}
> +
> +static QEMUOptionParameter hlbs_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    }, {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    }, {
> +        .name = BLOCK_OPT_PREALLOC,
> +        .type = OPT_STRING,
> +        .help = "Preallocation mode (allowed values: off, full)"
> +    }, { NULL }
> +};
> +
> +static BlockDriver bdrv_hlbs = {
> +    .format_name    = "hlfs",
> +    .protocol_name  = "hlfs",
> +    .instance_size  = sizeof(BDRVHLBSState),
> +    .bdrv_file_open = hlbs_open,
> +    .bdrv_close     = hlbs_close,
> +    .bdrv_create    = hlbs_create,
> +    .bdrv_getlength = hlbs_getlength,
> +    .bdrv_read  = hlbs_read,
> +    .bdrv_write = hlbs_write,

.bdrv_flush is missing.

> +    .bdrv_get_allocated_file_size  = hlbs_get_allocated_file_size,
> +    .bdrv_snapshot_create   = hlbs_snapshot_create,
> +    .bdrv_snapshot_goto     = hlbs_snapshot_goto,
> +    .bdrv_snapshot_delete   = hlbs_snapshot_delete,
> +    .bdrv_snapshot_list     = hlbs_snapshot_list,
> +    .create_options = hlbs_create_options,
> +};
> +
> +static void bdrv_hlbs_init(void)
> +{
> +    if (log4c_init()) {
> +        g_message("log4c_init error!");
> +    }

This seems to be for libhlfs since the QEMU code doesn't use log4c.  If
it's safe to call this function more than once, please put it into
libhlfs instead of requiring the program that links to your library to
call it for you.

> @@ -2813,6 +2820,45 @@ if compile_prog "" "" ; then
>  fi
>  
>  ##########################################
> +# hlfs probe
> +if test "$hlfs" != "no" ; then
> +    cat > $TMPC <<EOF
> +#include <stdio.h>
> +#include <api/hlfs.h>
> +    int main(void) {
> +            return 0;
> +    }
> +EOF
> +    INC_GLIB_DIR=`pkg-config --cflags glib-2.0`

QEMU uses gthread-2.0 instead of the single-threaded glib-2.0.

You can reuse glib_cflags instead of probing again here.

> +    HLFS_DIR=`echo $HLFS_INSTALL_PATH`
> +    JVM_DIR=`echo $JAVA_HOME`
> +
> +    if [ `getconf LONG_BIT` -eq "64" ];then
> +    CLIBS="-L$JVM_DIR/jre/lib/amd64/server/ $CLIBS"
> +    fi
> +    if [ `getconf LONG_BIT` -eq "32" ];then
> +    CLIBS="-L$JVM_DIR/jre/lib/i386/server/ $CLIBS"
> +    fi

Hopefully there is a cleaner and portable way to do this (e.g.
pkg-config).

> +    CLIBS="-L$HLFS_DIR/lib/ $CLIBS"
> +    CFLAGS="-I$HLFS_DIR/include $CFLAGS"
> +    CFLAGS="$INC_GLIB_DIR $CFLAGS"
> +
> +    hlfs_libs="$CLIBS -lhlfs -llog4c -lglib-2.0 -lgthread-2.0 -lrt -lhdfs 
> -ljvm"

libhlfs should support pkg-config so that QEMU does not need to know
about -llog4c -lhdfs -ljvm.

> +    echo "999 CLIBS is $CLIBS\n"
> +    echo "999 CFLAGS is $CFLAGS\n"

Debugging?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]