[Gluster-devel] RFC on posix locks migration to new graph after a switch

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] RFC on posix locks migration to new graph after a switch

From:	Raghavendra Gowdappa
Subject:	[Gluster-devel] RFC on posix locks migration to new graph after a switch
Date:	Wed, 20 Jun 2012 15:37:11 -0400 (EDT)

Avati,

We had relied on posix lock-healing (here after locks refer to posix locks) 
done by protocol/client for lock migration to new graph. Lock healing is a 
feature implemented by protocol/client which simply reacquires all the granted 
locks stored in fd context after a reconnect to server. The way we leverage 
this lock healing feature of protocol client to migrate posix locks to new 
graph is: we migrate fds to new graph by opening a new fd on the same file in 
new graph (with fd context copied from old graph) and protocol/client 
reacquires all the granted locks in fd context. But, this solution has 
following issues:

1.If we open fds in new graph even before cleaning up of the old-transport, 
lock requests sent by protocol/client as part of healing will conflict with 
locks held on old-tranport and hence will fail (Note that with only client-side 
graph switch there is a single inode on server corresponding to two inodes - 
one corresponding to each of old and new graphs - on client). As a result locks 
are not migrated. The problem could've been solved if protocol/client had 
issued SETLKW requests instead of SETLK (the lock requests issued as part of 
healing would be granted when old-transport disconnects eventually). But, that 
has different set of issues. Even then, this is not a fool-proof solution, 
since there might already be other conflicting lock requests in the lock wait 
queue when protocol/client starts lock healing resulting in failure of 
lock-heal.

2. If we open fds in new graph after cleaning of old-transport, there is a 
window of time b/w old-tranport cleanup and lock-heal in new graph where 
potentially conflicting lock requests could be granted, there by causing lock 
requests sent as part of lock healing to fail.

One solution I can think of is to bring in a SETLK_MIGRATE lock command. 
SETLK_MIGRATE takes a transport identifier as a parameter along with usual 
arguments SETLK/SETLKW take (like lock range, lock-owner etc). SETLK_MIGRATE 
command migrates a lock from the transport passed as a parameter to the 
transport on which request came in, if two locks conflict only because they 
came from two different transports (all else - lock-range, lock-owner etc - 
being same). On absence of any live locks, SETLK_MIGRATE behaves similar to 
SETLK command.

protocol/client can make use of this SETLK_MIGRATE command in lock requests it 
sends as part of lock heal during open fop to migrate locks to new graph. 
Assuming that old-transport is not cleaned up at the time of lock-heal, 
SETLK_MIGRATE atomically migrates locks from old-transport to new-transport (on 
server). Now, the difficulty is in getting the identifier to old-transport on 
server from which locks are currently held. This can be solved if we store the 
peer transport identifier in lk-context on client (which can be easily obtained 
in an lk reply). We can pass the same transport identifier to server during 
healing.

I haven't yet completely thought of some issues like whether protocol/client 
can unconditionally use SETLK_MIGRATE in all lock requests it sends as part of 
healing or it should use SETLK_MIGRATE only during first attempt of healing 
after a graph-switch. However even if protocol/client wants to make such 
distinction, it can be easily worked out (either by fuse setting a special 
"migrate" key in xdata of open calls it sends as part of fd-migration or some 
different mechanism).

Please let me know your thoughts on this.

regards,
Raghavendra.

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] RFC on posix locks migration to new graph after a switch, Raghavendra Gowdappa <=

Prev by Date: Re: [Gluster-devel] Recent dict changes affecting QEMU-GlusterFS patches
Next by Date: Re: [Gluster-devel] split brain: how should it be cured?
Previous by thread: [Gluster-devel] Delay in establishing connection between client and server
Next by thread: [Gluster-devel] dht dependency on glusterfs
Index(es):
- Date
- Thread