[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monitoring (was: [Savannah-hackers-public] admin help)

From: Sylvain Beucler
Subject: Re: monitoring (was: [Savannah-hackers-public] admin help)
Date: Wed, 5 Apr 2006 00:30:59 +0200
User-agent: Mutt/1.5.11+cvs20060126

Besides Savannah, the FSF network is managed by FSF
employees. Savannah has SNMP installed by I don't know more.

Knowing about what is running or down is not a big issue - users will
probably notice it before the monitoring tool and tell us about
it. Then the delay is equal to our response time. Well, we currently
have a weekly issue with the ViewCVS service and nobody ever reported
it, so it would be useful anyway, I suppose.

It would be interesting, though, to setup some security checks, such
as: is the /home directory well ready-only when accessed through the
arch sftp service? Is it possible for a project member to commit to
CVSROOT/?  etc. I have a small CVS 'test suite' at

I'm also concerned about usage statistics. For example, the other day
the load went to 20 and I have about no clue what it was due
to. Mathieu Roy from Gna! told me about heavy SSH robot attacks that
could be more lightly rejected using dynamic IP-based restrictions and

Users may also be interested in SCM-related stats. Download stats are
very useful in the long run because we plan to mirror the download

Aside from that I never used Nagios and Cfengine yet, so any
discussion around those tools will come in handy :)


On Tue, Apr 04, 2006 at 12:36:13PM +0100, Dave Love wrote:
> Sylvain Beucler <address@hidden> writes:
> >> OK, I can doubtless do some of that.  I can also do sysadmin jobs if
> >> that's any use.  [Do you have a monitoring system in place for the
> >> services?  I have some experience of that if it's of any interest.]
> >
> > That's of interest :)
> OK, I assume you're not doing specific monitoring at present.  Sorry
> if this is the sort of thing you do know.  I haven't tried to look at
> the system yet.
> What I've done before is used Nagios (nagios2 is in Debian now, I
> think).  It runs on a separate server (preferably) to monitor whatever
> services you're providing and alerts groups of people by email (or SMS
> etc.) in a flexible way if something goes down.
> You can write `plugins' (e.g. in sh) to run specific tests if there
> isn't one available for what yo need.  As far as I know there isn't a
> standard CVS pserver plugin, for instance, but you can write something
> to test checking out an arbitrary file, for instance.  Previously I
> just used the built-in port check to test that pserver was running.
> Other things are canned, like testing an https URL (including
> validating the certificate).
> As well as monitoring things remotely, you can test locally on the
> server you're monitoring and report them either via a separate agent
> or SNMP, for instance.  (Nearly-full filesystems is a useful test in
> my experience!)
> You can get statistics on the services, which can sometimes be useful.
> You can expose status information to users from the web interface, if
> appropriate; it might avoid storms of reports when something fails if
> users can check that someone will have been alerted.
> It would make most sense to have Nagios (or something similar) set up
> to monitor all GNU services, though, and perhaps there's already
> something running which could be used.  I don't know how the GNU
> systems are actually adminned overall these days.
> I could look into setting up such a system if you think it would be
> sufficiently useful.  Maybe it isn't so useful if savannah is fairly
> reliable and problems always get spotted quickly these days, and it
> would be better to work on other things.
> Something like Cfengine might also be useful for helping with some of
> the issues in the task list, though I got rather disillusioned by
> Cfengine itself.  It can do things like tidy up lock files and manage
> daemon processes.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]