Monitoring the absence of failure
Here’s a very basic issue that I would appreciate some feedback on.
How do you ensure that a process is being run successfully with regular intervals? On one project, I use the backup gem to take a snapshot of the DB every 6 hours and upload it to S3.
Currently, I receive an email every time the backup has completed successfully. If it fails I also get an email, and if it somehow breaks silently I notice that I’m not getting emails anymore.
This is a rather primitive and annoying approach, but I trust it. I would love to find some automated alternative that I could trust, but so far I haven’t been able to. I know of utilities to monitor that a process is always running (eg. god), but what I need is something that can be configured to check eg. the last modification time of a file and alert me (reliably!) if it’s not within specific bounds.
Any recommendations? And how do you ensure that alerts are delivered?