qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] COLO: how to flip a secondary to a primary?


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] COLO: how to flip a secondary to a primary?
Date: Mon, 25 Jan 2016 18:59:13 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

* Wen Congyang (address@hidden) wrote:
> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I've been looking at what's needed to add a new secondary after
> > a primary failed; from the block side it doesn't look as hard
> > as I'd expected, perhaps you can tell me if I'm missing something!
> > 
> > The normal primary setup is:
> > 
> >    quorum
> >       Real disk
> >       nbd client
> 
> quorum
>    real disk
>    replication
>       nbd client
> 
> > 
> > The normal secondary setup is:
> >    replication
> >       active-disk
> >       hidden-disk
> >       Real-disk
> 
> IIRC, we can do it like this:
> quorum
>    replication
>       active-disk
>       hidden-disk
>       real-disk

Yes.

> > With a couple of minor code hacks; I changed the secondary to be:
> > 
> >    quorum
> >       replication
> >         active-disk
> >         hidden-disk
> >         Real-disk
> >       dummy-disk
> 
> after failover,
> quorum
>    replicaion(old, mode is secondary)
>      active-disk
>      hidden-disk*
>      real-disk*
>    replication(new, mode is primary)
>      nbd-client

Do you need to keep the old secondary-replication?
Does that just pass straight through?

> In the newest version, we active commit active-disk to real-disk.
> So it will be:
> quorum
>    replicaion(old, mode is secondary)
>      active-disk(it is real disk now)
>    replication(new, mode is primary)
>      nbd-client

How does that active-commit work?  I didn't think you
could change the real disk until you had the full checkpoint,
since you don't know whether the primary or secondaries
changes need to be written?

> > and then after the primary fails, I start a new secondary
> > on another host and then on the old secondary do:
> > 
> >   nbd_server_stop
> >   stop
> >   x_block_change top-quorum -d children.0         # deletes use of real 
> > disk, leaves dummy
> >   drive_del active-disk0
> >   x_block_change top-quorum -a node-real-disk
> >   x_block_change top-quorum -d children.1         # Seems to have deleted 
> > the dummy?!, the disk is now child 0
> >   drive_add buddy 
> > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> >   x_block_change top-quorum -a nbd-client
> >   c
> >   migrate_set_capability x-colo on
> >   migrate -d -b tcp:ibpair:8888
> > 
> > and I think that means what was the secondary, has the same disk
> > structure as a normal primary.
> > That's not quite happy yet, and I've not figured out why - but the
> > order/structure of the block devices looks right?
> > 
> > Notes:
> >    a) The dummy serves two purposes, 1) it works around the segfault
> >       I reported in the other mail, 2) when I delete the real disk in the
> >       first x_block_change it means the quorum still has 1 disk so doesn't
> >       get upset.
> 
> I don't understand the purpose 2.

quorum wont allow you to delete all it's members ('The number of children 
cannot be lower than the vote threshold 1')
and it's very tricky getting the order correct with add/delete; for example
I tried:

drive_add buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
# gets children.1
x_block_change top-quorum -a nbd-client
# deletes the secondary replication
x_block_change top-quorum -d children.0
drive_del active-disk0
# ends up as children.0 but in the 2nd slot
x_block_change top-quorum -a node-real-disk

info block shows me:
top-quorum (#block615): json:{"children": [
    {"driver": "replication", "mode": "primary", "file": {"port": "8889", 
"host": "ibpair", "driver": "nbd", "export": "colo-disk0"}},
    {"driver": "raw", "file": {"driver": "file", "filename": 
"/home/localvms/bugzilla.raw"}}
   ],
   "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, 
"vote-threshold": 1} (quorum)
    Cache mode:       writeback

that has the replication first and the file second; that's the opposite
from the normal primary startup - does it matter?

I can't add node-real-disk until I drive_del active-disk0 (which
previously used it);  and I can't drive_del until I remove
it from the quorum; but I can't remove that from the quorum first,
because that leaves an empty quorum.

> >    b) I had to remove the restriction in quorum_start_replication
> >       on which mode it would run in. 
> 
> IIRC, this check will be removed.
> 
> >    c) I'm not really sure everything knows it's in secondary mode yet, and
> >       I'm not convinced whether the replication is doing the right thing.
> >    d) The migrate -d -b   eventually fails on the destination, not worked 
> > out why
> >       yet.
> 
> Can you give me the error message?

I need to repeat it to check; it was something like a bad flag from the block 
migration
code; it happened after the block migration hit 100%.

> >    e) Adding/deleting children on quorum is hard having to use the 
> > children.0/1
> >       notation when you've added children using node names - it's worrying
> >       which number is which; is there a way to give them a name?
> 
> No. I think we can improve 'info block' output.

Yes, that would be good; I thought it was the order in the list; but after
debugging it today I'm not convinced it is; I think it always keeps the same
name - so for example if you start off with [children.0, children.1]; then
delete children.0 you now have [children.1];  if you then add a new
child I *think* that becomes children.0 but you end up with 
[children.1,children.0]

> >    f) I've not thought about the colo-proxy that much yet - I guess that
> >       existing connections need to keep their sequence number offset but
> >       new connections made by what is now the primary dont need to do 
> > anything
> >       special.
> 
> Hailiang or Zhijian can answer this question.

Thanks,

> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]