Re: monit 4.2 release?

monit-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit 4.2 release?

From:	Martin Pala
Subject:	Re: monit 4.2 release?
Date:	Thu, 12 Feb 2004 23:21:35 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040122 Debian/1.6-1

Jan-Henrik Haukeland wrote:

Unless you plan to refactor the whole event engine, it shouldn't
require to much code to add an up-event in the existing code? If you
are planning to refactor, please let us know your ideas first. The
reason I say this is that I'm unsure if and what needs refactoring in
the event engine.

I'm trying to refactor the whole engine. I will try to describe presentidea (partialy done). The following is just development ideas (not cleanor fully functional code):




0.) new syntax:
---------------

To the IF...THEN statement is added ELSE statement, which specifiesaction in common syntax, dummy example:



#    if failed port 443 type tcpssl proto http with timeout 15 seconds
#       then restart
#       else alert

=> in the case that connecion failed, monit will restart service andsend alert. As soon as the service is up again, it will send alert.

Note: "else alert" part is implicit => it is not needed to write it.ELSE statement has sense in the case, that you need to do other action(for example EXEC) in the case that the service is up again, like:


#    if failed port 22 proto ssh with timeout 15 seconds
#       then restart
#       else exec "/bin/sms_send 'ssh is up again'"



1.) New structures:
-------------------

Structures provide common objects for all services and service tests.EventAction_T object replaces recurrent Command_T, event_handled,event_flag, etc. It contains actions specification (including optionalCommand_T for execution) for sevice and test "up" and "down" state.Events affects either primary monitored service object or particulartests - it depends on the event type. For example NONEXIST means thatthe service (for example process) does not exist at all, so globals->action is taken. On the other side CONNECTION event allows to do testspecific s->portlist->action for given test instance (each connectiontest has its own action). EventList is global.



/** Defines an event action object */
typedef struct myaction {

int action; /**< Action to bedone */Command_T exec; /**< Optional command to beexecuted */

} *Action_T;


/** Defines event's up and down actions */
typedef struct myeventaction {

Action_T down; /**< Action in the case offailure up */Action_T up; /**< Action in the case offailure up */

} *EventAction_T;


/** Defines an event */
typedef struct myevent {

int id; /**< The eventidentification */short state; /**< TRUE for failed, FALSE for passed eventstate */unsigned long counter; /**< Event occurencecounter */EventAction_T action; /**< Description of the eventaction */char *message; /**< Optional message describing theevent */

} *Event_T;


/** Defines a pending events list */
typedef struct myeventlist {

Event_T entry; /**< Pending events listobject */


  /** For internal use */

Event_T next; /**< next event inchain */

} *EventList_T;

...

/** Defines gid object */
typedef struct mygid {

gid_t gid; /**<Owner's gid */int has_error; /**< TRUE if the service has a GIDerror */EventAction_T action; /**< Description of the action upon eventoccurence */

} *Gid_T;

...

/** Defines service data */
typedef struct myservice {
...

EventAction_T action; /**< Description of the action upon eventoccurence */

...

EventList_T eventspending; /**< Pending eventslist */

...
} *Service_T;



2.) Event posting interface
---------------------------

prototype:

/**
 * Post a new Event
 * @param service The Service the event belongs to
 * @param id The event identification
 * @param state TRUE for failed, FALSE for passed event state
 * @param action Description of the event action
 * @param s Optional message describing the event
 */

void Event_post(Service_T service, long id, short state, EventAction_Taction, char *s, ...);


Example usage:


static int check_directory(Service_T s) {

  struct stat stat_buf;
  char report[STRLEN]= {0};

  if(stat(s->path, &stat_buf) != 0) {

Event_post(s, EVENT_NONEXIST, TRUE, s->action, "Event: directory'%s' doesn't exist", s->name);

    return FALSE;
  } else {

Event_post(s, EVENT_NONEXIST, FALSE, s->action, "Event: directory'%s' exist", s->name);

  }

  if(!S_ISDIR(stat_buf.st_mode)) {

Event_post(s, EVENT_INVALID, TRUE, s->action, "Event: '%s' is notdirectory", s->name);

    return FALSE;
  } else {

Event_post(s, EVENT_INVALID, FALSE, s->action, "Event: '%s' is notdirectory", s->name);

  }

  if(check_perm(s, stat_buf.st_mode, report)) {
    Event_post(s, EVENT_PERMISSION, TRUE, s->perm->action, report);
  } else {

Event_post(s, EVENT_PERMISSION, FALSE, s->perm->action, "Event:'%s' permission passed", s->name);

  }
...




3.) New events:
-----------

NONEXISTENT ... for critical service failure which causes restart bydefault: process not running/file doesn't exist/etc..


INVALID ... bad type (path is not file in file service check, etc.)



4.) State machine:
------------------

Each service has its own global events list. Events are either positive(in the case of failure) or negative (in the case that test passed). Onconfiguration reload is the list freed.

Event is identified by EventAction_T pointer address, which is uniqueuefor each object which supports action (=> can be source of events). Whenevent occures, monit will go through the list and try to find the eventwith same source (action pointer address) and type. When such eventdoesn't exist, monit will add it and set event counter to 1 occurence.In the case of another event with the same "polarity" (positive ornegative), monit will increment counter.

When event with counter polarity appears, monit will find it in thequeue, checge polarity and reset counter to 1.

Event handler is called for the event at the end of the event posting.It sends alert in the case that counter == 1. Event contains eventidentification and polarity => event handler will do required actionspecific for given combination.

void Event_post(Service_T service, long id, short state, EventAction_Taction, char *s, ...) {


  Event_t e = NULL;

  ASSERT(service);
  ASSERT(action);

  if(service->eventspending == NULL)
  {
    /* initialize event list and add first event */
    NEW(service->eventspending);
    NEW(e);
    e->id = id;
    e->state = state;
    e->counter = 1;
    e->action = action;
    if(s)
      long l;
      va_list ap;

      va_start(ap, s);
      e->message = format(s, ap, &l);
      va_end(ap);
    }
    service->eventspending->entry = e;
  }
  else
  {

/* in the case that event list exists it should contain at leastone event already */

    e = service->eventspending->entry;

    /* Try to find the event with the same origin and type identification.
     * Each service and each test have its own custom actions object, so
     * we use actions object address for event source identification. */
    do
    {
      if(e->action == action && e->id == id)
      {
        if(e->state == state)
        {
          /* recurrent event */
          e->counter++;
          break;
        }
        else
        {
          /* event state changed */
          e->state = state;
          e->counter = 1;
          break;
        }
      }

      e = e->next;
    }
    while(e);

    if(!e)
    {
      /* event was not find in pending events list, we will add it */
      NEW(e);
      e->id = id;
      e->type = type;
      e->counter = 1;
      e->action = action;
      if(s)
        long l;
        va_list ap;

        va_start(ap, s);
        e->message = format(s, ap, &l);
        va_end(ap);
      }
      e->next = service->eventspendinglist->entry;
      service->eventspendinglist->entry= e;
    }
  }

  /* FIXME: upravit chovani dle noveho modelu */

  handle_event(e);

}



SUMMARY:
--------

There are lot of ideas and issues which i'm thinking about. The abovemodel is not final, major modification is very possible. It will needchange to support following rule:


IF x RESTARTS WITHIN y CYCLES THEN TIMEOUT

I preffer to prepare the model so, that following general syntax will bepossible:


IF x event WITHIN y CYCLES THEN action

These actions should be possible to stack and this way support "hard"and "soft" error levels (action depending of error condition level):

This is just add-on - backward compatible implicit rules are predefined,so if you will not specify it, monit will behave like usual.

... lot of work, insufficient time. I didn't wanted to spent much timenow (to not freeze my project), so if you preffer simplier solution,i'll by happy :)



Martin

[Prev in Thread]

Current Thread

[Next in Thread]

monit 4.2 release?, Martin Pala, 2004/02/12
- Re: monit 4.2 release?, Jan-Henrik Haukeland, 2004/02/12
  - Re: monit 4.2 release?, Martin Pala <=
    - Re: monit 4.2 release?, Jan-Henrik Haukeland, 2004/02/16
    - Re: monit 4.2 release?, Jan-Henrik Haukeland, 2004/02/16
    - Re: monit 4.2 release?, Michael Shigorin, 2004/02/17

Prev by Date: Re: monit 4.2 release?
Next by Date: Re: monitrc erroneous line numbering off by one
Previous by thread: Re: monit 4.2 release?
Next by thread: Re: monit 4.2 release?
Index(es):
- Date
- Thread