Operations | Monitoring | ITSM | DevOps | Cloud

DevOps

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Introducing Incident Types

We believe incident.io should be used across an organisation, from SRE teams to Customer Success and People Ops. Until now, the way you set up your incident response flows has relied on having one set of roles and fields for every incident, meaning you have to choose between having lots of irrelevant fields to cover every use-case, or not getting the full incident.io experience on some incidents. That’s changing today with incident types, conditional fields and roles!

Customizing your Application with Epinio

One of the best things about Kubernetes is just how absurdly flexible it is. You, as an admin, can shape what gets deployed into what is the best for your business. Whether this is a basic webapp with just a deployment, service and ingress; or if you need all sorts of features with sidecars and network policies wrapping the serverless service-mesh platform of the day. The power is there.

How to configure Netdata's all-new Anomaly Advisor, powered by ML, for real-time troubleshooting

Netdata's Lead Machine Learning Engineer, Andrew Maguire, walks through how to configure the all-new Anomaly Advisor. This new feature lets you troubleshoot in real-time, at scale, by identifying periods of time with raised anomaly rates across your entire infrastructure. In this guided video, Andrew will explain how to enable Netdata's ML functionality then, how to set up unsupervised anomaly detection with minimal configuration, and lastly how the Anomaly Advisor works to speed up troubleshooting when an incident occurs.

Centralized application management on Kubernetes

A centralized application management approach can help you improve developer productivity and application support times and reduce toil for DevOps teams. This example shows how you can centralize all the necessary information your teams need to support their applications, regardless of the pipeline, IaC, cluster, or GitOps tools used. All in just a few minutes. Resources.

Error Budgets: Ultimate SRE Guide For Teams

Any engineered system does not guarantee 100% uptime. There are bound to be some unforeseen system failures that cause downtime for the customers or create a poor customer experience. It is, therefore, best practice to take into account a margin for plausible failures. An error budget is this margin of error that the customer is informed about beforehand to secure tolerance during system failure for a decided number of hours.

Embedded Linux development on Ubuntu - Part I

Welcome to this three-part mini-series on embedded Linux development on Ubuntu. Throughout this series, we will discuss the key challenges of traditional software distribution mechanisms for embedded Linux devices. We will understand why legacy development and update approaches do not suit the Internet-of-Things (IoT) world and assess how Ubuntu simplifies and secures embedded Linux development.

10 Reasons You Need A Service Level Agreement & Why It's important

A Service Level Agreement (SLA) consists of many service commitments. It is an essential part of a contract to outsource software development or software support between two or more parties, specifying the duties and the quality and type of service a company would provide for a fee to a customer.