[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
cfservd thrushes, nodes fail to get anything
From: |
Yaroslav Halchenko |
Subject: |
cfservd thrushes, nodes fail to get anything |
Date: |
Sat, 7 May 2005 11:50:59 -0400 |
User-agent: |
Mutt/1.5.8i |
Dear All,
Yesterday one of the users filled up /tmp on a main node with junk and it
rendered
cfengine unusable. First it reported
daemon.log:May 6 21:11:23 ravana cfservd[16657]: Couldn't open checksum
database /tmp/testDATABASEcache
daemon.log:May 6 21:11:23 ravana cfservd[16657]: db_open: No space left on
device
and seems after that whenever any node connects to it - cfservd
becomes extremely busy and then finally fails with next message being
reported by the nodes
cfengine:node20: Received signal 13 (SIGPIPE) while doing [no_active_lock]
cfengine:node20: Logical start time Fri May 6 23:51:10 2005
cfengine:node20: This sub-task started really at Fri May 6 23:51:10 2005
or actually now for some reason without a node name
cfengine:: Received signal 13 (SIGPIPE) while doing [pre-lock-state]
cfengine:: Logical start time Sat May 7 11:00:33 2005
cfengine:: This sub-task started really at Sat May 7 11:00:33 2005
and then another stating refusal for copying
cfengine:: Transmission refused or failed statting
/etc/cfengine/inputs/CVS/Repository
Got:
cfengine:: Received signal 13 (SIGPIPE) while doing
[lock.cfagent_conf.node2.copy.copy_3343]
cfengine:: Logical start time Sat May 7 04:30:29 2005
cfengine:: This sub-task started really at Sat May 7 04:30:29 2005
I've tried restarting cfengine parts on both ends - doesn't help.
running cfservd with -d2 gave next: while trying to run update script
(copy /etc/cfengine/input files across the nodes into /etc/cfengine)
----------------------------------------
...
Access privileges - match found
cfservd: Host node2.ravana.rutgers.edu granted access to
/etc/cfengine/inputs/CVS/Root
Clocks were off by 0
StatFile(/etc/cfengine/inputs/CVS/Root)
OK: type=0
mode=644
lmode=0
uid=0
gid=0
size=10
atime=1115477605
mtime=1067285389
Transaction Send[t 65][Packed text]
Attempting to send 73 bytes
SendSocketStream, sent 73
Transaction Send[t 3][Packed text]
Attempting to send 11 bytes
SendSocketStream, sent 11
RecvSocketStream(8)
(Concatenated 8 from stream)
Transaction Receive [t 51][]
RecvSocketStream(51)
(Concatenated 51 from stream)
Received: [MD5 /etc/cfengine/inputs/CVS/Root] on socket 5
CompareLocalChecksums(/etc/cfengine/inputs/CVS/Root,MD5=05e8d918529f204488a626792c4f8a6f)
ChecksumChanged: key /etc/cfengine/inputs/CVS/Root with data
MD5=05e8d918529f204488a626792c4f8a6f
<At this point it stalls for a minute or two although cfservd running
busy>
IPV4 address
sockaddr_ntop(10.0.0.2)
Obtained IP address of 10.0.0.2 on socket 7 from accept
FuzzyItemIn(LIST,10.0.0.2)
Purging Old Connections...
Done purging
FuzzyItemIn(LIST,10.0.0.2)
cfservd: Denying repeated connection from 10.0.0.2
----------------------------------------
from client (cfagent) side it looks like
----------------------------------------
Compare binary sums on ravana:/etc/cfengine/inputs/CVS/Root &
/var/lib/cfengine2/inputs/CVS/Root
Using network md5 checksum instead
ChecksumFile(m,/var/lib/cfengine2/inputs/CVS/Root)
Send digest of /var/lib/cfengine2/inputs/CVS/Root to server,
MD5=05e8d918529f204488a626792c4f8a6f
Transaction Send[t 51][Packed text]
Attempting to send 59 bytes
SendSocketStream, sent 59
RecvSocketStream(8)
<STALLS HERE and I got bored waiting till it dies... may be it never
dies this time>
----------------------------------------
So here are the questions:
1. how to fix current situation?
clearly there is something broken in a current state, so may be I can
clean out cfengine state so as to start from a clean one - I wouldn't
mind if it takes longer to run for the first time ;-) Sure I can
completely reinstall and then it should work I believe but...
2. what would be a nice policy to enforce over /tmp so I don't
remove anything valuable (like ssh-agent sockets and some other staff
opened by running programs). I'm thinking about smth like files and
directories large in size should be forbidden (>1M) if they are older
than an hour. I'm not sure if I can discard data solely on age, so
age+size sounds good to me..
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07105
Student Ph.D. @ CS Dept. NJIT
- cfservd thrushes, nodes fail to get anything,
Yaroslav Halchenko <=
- Re: cfservd thrushes, nodes fail to get anything, Yaroslav Halchenko, 2005/05/07
- Re: cfservd thrushes, nodes fail to get anything, Mark Burgess, 2005/05/25
- RE: cfservd thrushes, nodes fail to get anything, Luke Youngblood, 2005/05/25
- RE: cfservd thrushes, nodes fail to get anything, Mark Burgess, 2005/05/25
- Re: cfservd thrushes, nodes fail to get anything, Dustin Sorge, 2005/05/25
- Re: cfservd thrushes, nodes fail to get anything, Mark Burgess, 2005/05/25