[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: monit 4.2 release?
From: |
Martin Pala |
Subject: |
Re: monit 4.2 release? |
Date: |
Thu, 12 Feb 2004 23:21:35 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040122 Debian/1.6-1 |
Jan-Henrik Haukeland wrote:
Unless you plan to refactor the whole event engine, it shouldn't
require to much code to add an up-event in the existing code? If you
are planning to refactor, please let us know your ideas first. The
reason I say this is that I'm unsure if and what needs refactoring in
the event engine.
I'm trying to refactor the whole engine. I will try to describe present
idea (partialy done). The following is just development ideas (not clean
or fully functional code):
0.) new syntax:
---------------
To the IF...THEN statement is added ELSE statement, which specifies
action in common syntax, dummy example:
# if failed port 443 type tcpssl proto http with timeout 15 seconds
# then restart
# else alert
=> in the case that connecion failed, monit will restart service and
send alert. As soon as the service is up again, it will send alert.
Note: "else alert" part is implicit => it is not needed to write it.
ELSE statement has sense in the case, that you need to do other action
(for example EXEC) in the case that the service is up again, like:
# if failed port 22 proto ssh with timeout 15 seconds
# then restart
# else exec "/bin/sms_send 'ssh is up again'"
1.) New structures:
-------------------
Structures provide common objects for all services and service tests.
EventAction_T object replaces recurrent Command_T, event_handled,
event_flag, etc. It contains actions specification (including optional
Command_T for execution) for sevice and test "up" and "down" state.
Events affects either primary monitored service object or particular
tests - it depends on the event type. For example NONEXIST means that
the service (for example process) does not exist at all, so global
s->action is taken. On the other side CONNECTION event allows to do test
specific s->portlist->action for given test instance (each connection
test has its own action). EventList is global.
/** Defines an event action object */
typedef struct myaction {
int action; /**< Action to be
done */
Command_T exec; /**< Optional command to be
executed */
} *Action_T;
/** Defines event's up and down actions */
typedef struct myeventaction {
Action_T down; /**< Action in the case of
failure up */
Action_T up; /**< Action in the case of
failure up */
} *EventAction_T;
/** Defines an event */
typedef struct myevent {
int id; /**< The event
identification */
short state; /**< TRUE for failed, FALSE for passed event
state */
unsigned long counter; /**< Event occurence
counter */
EventAction_T action; /**< Description of the event
action */
char *message; /**< Optional message describing the
event */
} *Event_T;
/** Defines a pending events list */
typedef struct myeventlist {
Event_T entry; /**< Pending events list
object */
/** For internal use */
Event_T next; /**< next event in
chain */
} *EventList_T;
...
/** Defines gid object */
typedef struct mygid {
gid_t gid; /**<
Owner's gid */
int has_error; /**< TRUE if the service has a GID
error */
EventAction_T action; /**< Description of the action upon event
occurence */
} *Gid_T;
...
/** Defines service data */
typedef struct myservice {
...
EventAction_T action; /**< Description of the action upon event
occurence */
...
EventList_T eventspending; /**< Pending events
list */
...
} *Service_T;
2.) Event posting interface
---------------------------
prototype:
/**
* Post a new Event
* @param service The Service the event belongs to
* @param id The event identification
* @param state TRUE for failed, FALSE for passed event state
* @param action Description of the event action
* @param s Optional message describing the event
*/
void Event_post(Service_T service, long id, short state, EventAction_T
action, char *s, ...);
Example usage:
static int check_directory(Service_T s) {
struct stat stat_buf;
char report[STRLEN]= {0};
if(stat(s->path, &stat_buf) != 0) {
Event_post(s, EVENT_NONEXIST, TRUE, s->action, "Event: directory
'%s' doesn't exist", s->name);
return FALSE;
} else {
Event_post(s, EVENT_NONEXIST, FALSE, s->action, "Event: directory
'%s' exist", s->name);
}
if(!S_ISDIR(stat_buf.st_mode)) {
Event_post(s, EVENT_INVALID, TRUE, s->action, "Event: '%s' is not
directory", s->name);
return FALSE;
} else {
Event_post(s, EVENT_INVALID, FALSE, s->action, "Event: '%s' is not
directory", s->name);
}
if(check_perm(s, stat_buf.st_mode, report)) {
Event_post(s, EVENT_PERMISSION, TRUE, s->perm->action, report);
} else {
Event_post(s, EVENT_PERMISSION, FALSE, s->perm->action, "Event:
'%s' permission passed", s->name);
}
...
3.) New events:
-----------
NONEXISTENT ... for critical service failure which causes restart by
default: process not running/file doesn't exist/etc..
INVALID ... bad type (path is not file in file service check, etc.)
4.) State machine:
------------------
Each service has its own global events list. Events are either positive
(in the case of failure) or negative (in the case that test passed). On
configuration reload is the list freed.
Event is identified by EventAction_T pointer address, which is uniqueue
for each object which supports action (=> can be source of events). When
event occures, monit will go through the list and try to find the event
with same source (action pointer address) and type. When such event
doesn't exist, monit will add it and set event counter to 1 occurence.
In the case of another event with the same "polarity" (positive or
negative), monit will increment counter.
When event with counter polarity appears, monit will find it in the
queue, checge polarity and reset counter to 1.
Event handler is called for the event at the end of the event posting.
It sends alert in the case that counter == 1. Event contains event
identification and polarity => event handler will do required action
specific for given combination.
void Event_post(Service_T service, long id, short state, EventAction_T
action, char *s, ...) {
Event_t e = NULL;
ASSERT(service);
ASSERT(action);
if(service->eventspending == NULL)
{
/* initialize event list and add first event */
NEW(service->eventspending);
NEW(e);
e->id = id;
e->state = state;
e->counter = 1;
e->action = action;
if(s)
long l;
va_list ap;
va_start(ap, s);
e->message = format(s, ap, &l);
va_end(ap);
}
service->eventspending->entry = e;
}
else
{
/* in the case that event list exists it should contain at least
one event already */
e = service->eventspending->entry;
/* Try to find the event with the same origin and type identification.
* Each service and each test have its own custom actions object, so
* we use actions object address for event source identification. */
do
{
if(e->action == action && e->id == id)
{
if(e->state == state)
{
/* recurrent event */
e->counter++;
break;
}
else
{
/* event state changed */
e->state = state;
e->counter = 1;
break;
}
}
e = e->next;
}
while(e);
if(!e)
{
/* event was not find in pending events list, we will add it */
NEW(e);
e->id = id;
e->type = type;
e->counter = 1;
e->action = action;
if(s)
long l;
va_list ap;
va_start(ap, s);
e->message = format(s, ap, &l);
va_end(ap);
}
e->next = service->eventspendinglist->entry;
service->eventspendinglist->entry= e;
}
}
/* FIXME: upravit chovani dle noveho modelu */
handle_event(e);
}
SUMMARY:
--------
There are lot of ideas and issues which i'm thinking about. The above
model is not final, major modification is very possible. It will need
change to support following rule:
IF x RESTARTS WITHIN y CYCLES THEN TIMEOUT
I preffer to prepare the model so, that following general syntax will be
possible:
IF x event WITHIN y CYCLES THEN action
These actions should be possible to stack and this way support "hard"
and "soft" error levels (action depending of error condition level):
This is just add-on - backward compatible implicit rules are predefined,
so if you will not specify it, monit will behave like usual.
... lot of work, insufficient time. I didn't wanted to spent much time
now (to not freeze my project), so if you preffer simplier solution,
i'll by happy :)
Martin