Adaptive Alerts is a new feature from Rollbar that adds to our reliable, informative and actionable alerts about unexpected issues in monitored applications and services. Adaptive Alerts uses anomaly detection to learn the standard behavior of enterprise applications, and alerts developers about atypical exception rates, reducing unwanted noise.
If you think log files are only necessary for satisfying audit and compliance requirements, or to help software engineers debug issues during development, you’re certainly not alone. Although log files may not sound like the most engaging or valuable assets, for many organizations, they are an untapped reservoir of insights that can offer significant benefits to your business.
Back in May, we announced the Kubernetes integration to help users easily monitor and alert on core Kubernetes cluster metrics using the Grafana Agent, our lightweight observability data collector optimized for sending metric, log, and trace data to Grafana Cloud. The integration allows Grafana Cloud users to monitor and alert on Kubernetes cluster metrics. Since the original release, we’ve added new features and enhancements to help our users go even further.
In the latest episode of the Network AF podcast, your host Avi Freedman welcomes his friend and networking pro Cat Gurinski to the show. As a senior network engineer with loads of experience, Cat is most passionate about automation and troubleshooting, and especially loves to use Python and Arista’s pyeapi frameworks in her pursuits. She’s also the current chair of the NANOG Program Committee, and previously worked for companies like Best Buy, Switch and Data, and Equinix.
Last June, Tigera announced a first for Kubernetes: supporting open-source WireGuard for encrypting data in transit within your cluster. We never like to sit still, so we have been working hard on some exciting new features for this technology, the first of which is support for WireGuard on AKS using the Azure CNI. First a short recap about what WireGuard is, and how we use it in Calico.
I have a good sense of how to use traces to understand my system’s behavior within request/response cycles. What about multi-request processes? What about async tasks spawned within a request? Is there a higher-level or more holistic approach?
Today’s incident pipelines are noisy. The average enterprise deals with at least 15 different monitoring and observability tools that create thousands of alerts a day, often overwhelming and drowning their IT operations. But it’s not just their number that’s an issue.
This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.
One more “ops” phoneme like DevOps is ChatOps; or conversation-based development/operations. ChatOps has been growing in popularity as communication platforms such as Slack is ingrained in our day-to-day engineering lives. A team lead once told me “if it didn’t happen in Slack, it didn’t happen” showing the emphasis of communication platforms as a system of record.