[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug
From: |
Li, Liang Z |
Subject: |
Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug |
Date: |
Wed, 4 May 2016 10:03:06 +0000 |
> To: Li, Liang Z
> Cc: address@hidden; address@hidden; address@hidden
> Subject: Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression
> bug
>
> Liang Li <address@hidden> wrote:
> > Recently, a bug related to multiple thread compression feature for
> > live migration is reported. The destination side will be blocked
> > during live migration if there are heavy workload in host and memory
> > intensive workload in guest, this is most likely to happen when there
> > is one decompression thread.
> >
> > Some parts of the decompression code are incorrect:
> > 1. The main thread receives data from source side will enter a busy
> > loop to wait for a free decompression thread.
> > 2. A lock is needed to protect the decomp_param[idx]->start, because
> > it is checked in the main thread and is updated in the decompression
> > thread.
> >
> > Fix these two issues by following the code pattern for compression.
> >
> > Reported-by: Daniel P. Berrange <address@hidden>
> > Signed-off-by: Liang Li <address@hidden>
>
> step in the right direction, so:
> Reviewed-by: Juan Quintela <address@hidden>
>
> but I am still not sure that this is
> enough. if you have the change, look at the multiple-fd code that I posted,
> is
> very similar here.
>
>
> > struct DecompressParam {
>
> what protect start, and what protect done?
>
decomp_param[i]-> mutex protects start and decomp_done_lock
protects done.
>
> > bool start;
> > + bool done;
> > QemuMutex mutex;
> > QemuCond cond;
> > void *des;
> > @@ -287,6 +288,8 @@ static bool quit_comp_thread; static bool
> > quit_decomp_thread; static DecompressParam *decomp_param; static
> > QemuThread *decompress_threads;
> > +static QemuMutex decomp_done_lock;
> > +static QemuCond decomp_done_cond;
> >
> > static int do_compress_ram_page(CompressParam *param);
> >
> > @@ -834,6 +837,7 @@ static inline void
> start_compression(CompressParam
> > *param)
> >
> > static inline void start_decompression(DecompressParam *param) {
>
> Here nothing protects done
start_decompression is call when holding the decomp_done_lock.
>
> > + param->done = false;
> > qemu_mutex_lock(¶m->mutex);
> > param->start = true;
> > qemu_cond_signal(¶m->cond);
> > @@ -2193,19 +2197,24 @@ static void *do_data_decompress(void
> *opaque)
> > qemu_mutex_lock(¶m->mutex);
>
> we are looking at quit_decomp_thread and nothing protects it
>
>
> > while (!param->start && !quit_decomp_thread) {
> > qemu_cond_wait(¶m->cond, ¶m->mutex);
> > + }
> > + if (!quit_decomp_thread) {
> > pagesize = TARGET_PAGE_SIZE;
> > - if (!quit_decomp_thread) {
> > - /* uncompress() will return failed in some case, especially
> > - * when the page is dirted when doing the compression, it's
> > - * not a problem because the dirty page will be
> > retransferred
> > - * and uncompress() won't break the data in other pages.
> > - */
> > - uncompress((Bytef *)param->des, &pagesize,
> > - (const Bytef *)param->compbuf, param->len);
> > - }
> > - param->start = false;
> > + /* uncompress() will return failed in some case, especially
> > + * when the page is dirted when doing the compression, it's
> > + * not a problem because the dirty page will be retransferred
> > + * and uncompress() won't break the data in other pages.
> > + */
> > + uncompress((Bytef *)param->des, &pagesize,
> > + (const Bytef *)param->compbuf, param->len);
>
> We are calling uncompress (a slow operation) with param->mutex taken, is
> there any reason why we can't just put the param->* vars in locals?
>
> > }
> > + param->start = false;
>
> Why are we setting start to false when we _are_ not decompressing a page?
> I think this line should be inside the loop.
>
> > qemu_mutex_unlock(¶m->mutex);
> > +
> > + qemu_mutex_lock(&decomp_done_lock);
> > + param->done = true;
>
> here param->done is protected by decomp_done_lock.
>
> > + qemu_cond_signal(&decomp_done_cond);
> > + qemu_mutex_unlock(&decomp_done_lock);
> > }
> >
> > return NULL;
> > @@ -2219,10 +2228,13 @@ void
> migrate_decompress_threads_create(void)
> > decompress_threads = g_new0(QemuThread, thread_count);
> > decomp_param = g_new0(DecompressParam, thread_count);
> > quit_decomp_thread = false;
> > + qemu_mutex_init(&decomp_done_lock);
> > + qemu_cond_init(&decomp_done_cond);
> > for (i = 0; i < thread_count; i++) {
> > qemu_mutex_init(&decomp_param[i].mutex);
> > qemu_cond_init(&decomp_param[i].cond);
> > decomp_param[i].compbuf =
> > g_malloc0(compressBound(TARGET_PAGE_SIZE));
> > + decomp_param[i].done = true;
> > qemu_thread_create(decompress_threads + i, "decompress",
> > do_data_decompress, decomp_param + i,
> > QEMU_THREAD_JOINABLE); @@ -2258,9 +2270,10
> > @@ static void decompress_data_with_multi_threads(QEMUFile *f,
> > int idx, thread_count;
> >
> > thread_count = migrate_decompress_threads();
> > + qemu_mutex_lock(&decomp_done_lock);
>
> we took decomp_done_lock
>
> > while (true) {
> > for (idx = 0; idx < thread_count; idx++) {
> > - if (!decomp_param[idx].start) {
> > + if (decomp_param[idx].done) {
>
> and we can protecet done with it.
>
> > qemu_get_buffer(f, decomp_param[idx].compbuf, len);
> > decomp_param[idx].des = host;
> > decomp_param[idx].len = len;
>
> but this ones should be proteced by docomp_param[idx].mutex, no?
The code can work correct, but it looks confusion, it seems I should make the
lock more clear.
I will try to change it by referencing you multi -fd code. Thanks!
Liang
>
> > @@ -2270,8 +2283,11 @@ static void
> decompress_data_with_multi_threads(QEMUFile *f,
> > }
> > if (idx < thread_count) {
> > break;
> > + } else {
> > + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> > }
> > }
> > + qemu_mutex_unlock(&decomp_done_lock);
> > }
> >
> > /*
>
> Thanks, Juan.
- [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug, Liang Li, 2016/05/03
- Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug, Li, Liang Z, 2016/05/03
- Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug, Daniel P. Berrange, 2016/05/03
- Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug, Dr. David Alan Gilbert, 2016/05/03
- Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug, Juan Quintela, 2016/05/04
- Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug,
Li, Liang Z <=