gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] a bug when read files in a symbol-link directory


From: Vijay Bellur
Subject: Re: [Gluster-devel] a bug when read files in a symbol-link directory
Date: Mon, 07 Sep 2009 10:40:26 +0530
User-agent: Thunderbird 2.0.0.22 (X11/20090608)

Hi He,

Can you please re-create the problem with -L DEBUG and post both the client and server side logs?

Thanks,
Vijay


He Xiaobin wrote:

I use glusterfs in a cluster system (configured as: dht->afr->client->server->iothreads->locks->posix), after days running, it is stable, but with a poor porformance (slower thann NFS exported from only one server), and most important is that a bug came to me these days. This is really an emergency, so I need your help!

What is the BUG? In this system, I use mvapich+blcr for task checkpoint and restore. I don't know how mvapich works, but I am sure it used glusterfs in my case. When using glusterfs in checkpointing a task, it created one ckpt file for each proccess of the task, all the ckpt files placed in directory called 1, and it will create a symbol link called 0 pointing to directory 1. There is example, fortest is username, .ckpt is the ckpt file directory for this user, 1972 is the task id, 0 is the symbol link and bt.C.64-19.ckpt is a ckpt file the task's 19th proccess
address@hidden 1972]$ pwd
/mnt/glusterfs/.ckpt/1972
address@hidden 1972]$ ll
total 132
lrwxrwxrwx 1 fortest fortest 31 Sep 4 17:09 0 -> /mnt/glusterfs/fortest/.ckpt/1972/1
drwx------ 2 fortest fortest 65536 Sep  4 20:06 1
address@hidden 1972]$ ls 1/
bt.C.64-0.ckpt bt.C.64-21.ckpt bt.C.64-33.ckpt bt.C.64-45.ckpt bt.C.64-57.ckpt bt.C.64-10.ckpt bt.C.64-22.ckpt bt.C.64-34.ckpt bt.C.64-46.ckpt bt.C.64-58.ckpt bt.C.64-11.ckpt bt.C.64-23.ckpt bt.C.64-35.ckpt bt.C.64-47.ckpt bt.C.64-59.ckpt bt.C.64-12.ckpt bt.C.64-24.ckpt bt.C.64-36.ckpt bt.C.64-48.ckpt bt.C.64-5.ckpt bt.C.64-13.ckpt bt.C.64-25.ckpt bt.C.64-37.ckpt bt.C.64-49.ckpt bt.C.64-60.ckpt bt.C.64-14.ckpt bt.C.64-26.ckpt bt.C.64-38.ckpt bt.C.64-4.ckpt bt.C.64-61.ckpt bt.C.64-15.ckpt bt.C.64-27.ckpt bt.C.64-39.ckpt bt.C.64-50.ckpt bt.C.64-62.ckpt bt.C.64-16.ckpt bt.C.64-28.ckpt bt.C.64-3.ckpt bt.C.64-51.ckpt bt.C.64-63.ckpt bt.C.64-17.ckpt bt.C.64-29.ckpt bt.C.64-40.ckpt bt.C.64-52.ckpt bt.C.64-6.ckpt bt.C.64-18.ckpt bt.C.64-2.ckpt bt.C.64-41.ckpt bt.C.64-53.ckpt bt.C.64-7.ckpt bt.C.64-19.ckpt bt.C.64-30.ckpt bt.C.64-42.ckpt bt.C.64-54.ckpt bt.C.64-8.ckpt bt.C.64-1.ckpt bt.C.64-31.ckpt bt.C.64-43.ckpt bt.C.64-55.ckpt bt.C.64-9.ckpt
bt.C.64-20.ckpt  bt.C.64-32.ckpt  bt.C.64-44.ckpt  bt.C.64-56.ckpt
When the task need to be restored, mvapich will read the ckpt file from 0 (the symbol link) and restore the task! All this perform smoothly in NFS, but in glusterfs it will output following messages. However sometimes task restoring can finish at last, while others can't almost with the same messages. I have verifed the missing files mvapich outputed was indeed there. Another useful tips is that fewer gluster client doing the task, few times it would be came to this bug when task restoring. And startup glusterfs without direct-io could not help too.
OUTPUT OF THE TASK WHEN RESTORE:

19: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-19.ckpt: No such file or directory20: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-20.ckpt: No such file or directorysrun: error: gfsclient10: task[19-20]: Exited with exit code 1 21: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-21.ckpt: No such file or directory18: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-18.ckpt: No such file or directorysrun: error: gfsclient10: task21: Exited with exit code 1
srun: error: cn010: task18: Exited with exit code 1
17: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-17.ckpt: No such file or directorysrun: error: gfsclient10: task17: Exited with exit code 1 23: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-23.ckpt: No such file or directory22: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-22.ckpt: No such file or directorysrun: error: gfsclient10: task23: Exited with exit code 1
srun: error: cn010: task[16,22]: Exited with exit code 1
16: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-16.ckpt: No such file or directory


I use "debug/trace" and start the gluster with "-L DEBUG", and got the following logs when the ckpt can't to be found:

[2009-09-04 17:12:35] N [trace.c:1290:trace_readlink] tr0: 174536: (loc {path=/fortest/.ckp
t/1972/0, ino=1380450540}, size=4096)
[2009-09-04 17:12:35] N [trace.c:484:trace_readlink_cbk] tr0: 174536: (op_ret=31, op_errno=
0, buf=/mnt/glusterfs/fortest/.ckpt/1972/1)
[2009-09-04 17:12:35] E [fuse-bridge.c:987:fuse_readlink_cbk] glusterfs-fuse: 174536: /fortest/
.ckpt/1972/0 => /mnt/glusterfs/fortest/.ckpt/1972/1 @ 1252055555
[2009-09-04 17:12:35] N [trace.c:1245:trace_lookup] tr0: 174537: (loc {path=/fortest/.ckpt/
1972/1, ino=0})
[2009-09-04 17:12:35] N [trace.c:513:trace_lookup_cbk] tr0: 174508: (op_ret=0, ino=0, *buf {st_dev=2065, st_ino=7068450884, st_mode=40700, st_nlink=2, st_uid=1001, st_gid=1001, st_rd
ev=0, st_size=65536, st_blksize=4096, st_blocks=256})
[2009-09-04 17:12:35] E [fuse-bridge.c:255:fuse_loc_fill] glusterfs-fuse: inode_path failed for
 8003256399/bt.C.64-22.ckpt @ 1252055555
[2009-09-04 17:12:35] W [fuse-bridge.c:436:fuse_lookup] glusterfs-fuse: 174539: LOOKUP 80032563
99/bt.C.64-22.ckpt (fuse_loc_fill() failed)
[2009-09-04 17:12:35] N [trace.c:513:trace_lookup_cbk] tr0: 174522: (op_ret=0, ino=0, *buf {st_dev=2065, st_ino=7068450884, st_mode=40700, st_nlink=2, st_uid=1001, st_gid=1001, st_rd
ev=0, st_size=65536, st_blksize=4096, st_blocks=256})
[2009-09-04 17:12:35] E [fuse-bridge.c:255:fuse_loc_fill] glusterfs-fuse: inode_path failed for
 8003256399/bt.C.64-16.ckpt @ 1252055555
------------------------------------------------------------------------

_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]