[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add descript

From: Hailiang Zhang
Subject: Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
Date: Thu, 19 Jan 2017 10:50:19 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 2017/1/13 21:41, Stefan Hajnoczi wrote:
On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
+Issue qmp command:
+  { 'execute': 'blockdev-add',
+    'arguments': {
+        'driver': 'replication',
+        'node-name': 'rep',
+        'mode': 'primary',
+        'shared-disk-id': 'primary_disk0',
+        'shared-disk': true,
+        'file': {
+            'driver': 'nbd',
+            'export': 'hidden_disk0',
+            'server': {
+                'type': 'inet',
+                'data': {
+                    'host': 'xxx.xxx.xxx.xxx',
+                    'port': 'yyy'
+                }
+            }

block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.

Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().


Reviewed-by: Stefan Hajnoczi <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]