We are seeing some error messages when we restart applications.
The error message we see is:
monit: action failed -- Other action already in progress -- please try again later
To give a little context:
We don't use monit to start/stop the applications, but rather to monitor it and alert (and restart if needed), so the applications call monit monitor and unmonitor when starting and stopping the applications respectively.
The applications can come and go (be installed/uninstalled) from the system so they each have a config file in /etc/monid.d directory.
So a typical application startup sequence would look like:
- verify the application is registered with monit using the "monit summary" command and then looking for the application in the output
- start the application
- tell monit to monitor the application using "monit monitor"
- return status
The shutdown sequence would look like:
- verify the application is registered with monit using the "monit summary"
- tell monit to stop monitoring the application using the "monit unmonitor" command
- stop the application
- return status
On an application restart:
init_xxx restart the stop and start sequence is executed above.
I already know there is a "race" condition when monit is told to "register" the application and the time it actually shows up in the "summary" list, so our startup script waits to see the application appear in the "summary" output before saying the application is "registered".
Is there a similar race condition between monitor and unmonitor?
We are using the monit-5.3.2-2.el6.rf.i686 rpm on CentOS 6.2