Site Reliability Engineering, Observability, and the Tradeoffs of Modern Software
This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.
The latest News and Information on DevOps, CI/CD, Automation and related technologies.
This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.
One more “ops” phoneme like DevOps is ChatOps; or conversation-based development/operations. ChatOps has been growing in popularity as communication platforms such as Slack is ingrained in our day-to-day engineering lives. A team lead once told me “if it didn’t happen in Slack, it didn’t happen” showing the emphasis of communication platforms as a system of record.
A common DevOps use case involves alerting when hosts stop reporting metrics, aka a deadman alert. This can be done using the monitor.deadman() Flux function. One can easily create a deadman (or threshold) check in the InfluxDB UI Alerts section or craft a custom task to alert as well. Check out InfluxDB’s Checks and Notifications system post for more details. It’s also possible to use the monitor.deadman() function directly in a dashboard cell.