[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gluster-devel] brick half offline
From: |
Emmanuel Dreyfus |
Subject: |
[Gluster-devel] brick half offline |
Date: |
Sun, 29 Jul 2012 04:37:48 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi
I hit another rare problem, which seems replicable within 4 hours of usage:
one brick goes down, but not completely. It will not create a file, for
instance, but it will participate in file locking and cause it to fail,
because it did not create the file.
Here is the final symptom (this create the file then locks it)
client# echo "xxx"|cat -l > /gfs/foo
cat: stdout: No such file or directory
birck1# ls -l /export/gfs1/foo
-rw-r--r-- 2 root wheel 0 Jul 29 06:18 /export/gfs1/foo
brick2# ls -l /export/gfs1/foo
ls: /export/gfs1/foo: No such file or directory
client log for this operation:
[2012-07-29 06:18:10.430637] W [client3_1-fops.c:2186:client3_1_lk_cbk]
0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.431628] W [fuse-bridge.c:3196:fuse_setlk_cbk]
0-glusterfs-fuse: 11781877: ERR => -1 (No such file or directory)
[2012-07-29 06:18:10.434844] W [client3_1-fops.c:2186:client3_1_lk_cbk]
0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.435939] W [fuse-bridge.c:3196:fuse_setlk_cbk]
0-glusterfs-fuse: 11781880: ERR => -1 (No such file or directory)
brick1 logs nothing.
brick2 log for this operation:
[2012-07-29 06:18:10.430151] I [server3_1-fops.c:203:server_lk_cbk]
0-gfs-server: 2017229: LK -2 (--) ==> -1 (No such file or directory)
[2012-07-29 06:18:10.434281] I [server3_1-fops.c:203:server_lk_cbk]
0-gfs-server: 2017231: LK -2 (--) ==> -1 (No such file or directory)
But this is only the conseuence of an earlier problem, where brick2
went half-offline. Enough to refuse creating files, not not enough to
be excluded from locking operation. Here is how it happened:
brick2 log
[2012-07-28 22:30:08.024578] E [event.c:346:event_dispatch_poll_handler]
0-poll: index not found for fd=15 (idx_hint=6)
[2012-07-28 22:30:18.418768] I [server-handshake.c:571:server_setvolume]
0-gfs-server: accepted client from
client-18310-2012/07/27-03:03:28:140183437669610-gfs-client-1-0
(version: 3.3git)
client log
[2012-07-28 22:30:08.026975] W [socket.c:1512:__socket_proto_state_machine]
0-gfs-client-1: reading from socket failed. Error (Socket is not
connected), peer (192.0.2.98:24010)
[2012-07-28 22:30:08.027050] E [rpc-clnt.c:373:saved_frames_unwind]
0-gfs-client-1: forced unwinding frame type(GlusterFS 3.1)
op(WRITE(13)) called at 2012-07-28 22:30:08.026783 (xid=0x1990324x)
[2012-07-28 22:30:08.027224] W [client3_1-fops.c:821:client3_1_writev_cbk]
0-gfs-client-1: remote operation failed: Socket is not connected
[2012-07-28 22:30:08.027396] I [client.c:2090:client_rpc_notify]
0-gfs-client-1: disconnected
[2012-07-28 22:30:08.027553] W [client3_1-fops.c:4929:client3_1_fxattrop]
0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1.
EBADFD
[2012-07-28 22:30:08.030501] W [client3_1-fops.c:5306:client3_1_finodelk]
0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1.
EBADFD
[2012-07-28 22:30:18.419800] I
[client-handshake.c:1636:select_server_supported_programs] 0-gfs-client-1:
Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-07-28 22:30:18.420716] I [client-handshake.c:1433:client_setvolume_cbk]
0-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume
'/export/gfs1'.
[2012-07-28 22:30:18.420768] I [client-handshake.c:1454:client_setvolume_cbk]
0-gfs-client-1: Server and Client lk-version numbers are same, no need
to reopen the fds
We are said by both client and server that reconnexion was done without
a hitch, but it seems glusterfs did not really recovered.
--
Emmanuel Dreyfus
address@hidden
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gluster-devel] brick half offline,
Emmanuel Dreyfus <=