Operations | Monitoring | ITSM | DevOps | Cloud

SRE Incident Management: Overview, Techniques, and Tools

In the world of a site reliability engineer (SRE), failure is not only an option, but also expected. Systems, web applications, servers, devices, etc., are all prone to performance issues and unexpected outages at some point. It is an unavoidable fact. These unexpected failures can lead to huge revenue losses, customer trust and depending on the industry, maybe fines. Fortunately, SRE incident management is one of the core practices used to limit the disruption caused by unexpected issues.

Share your failures, fix them faster with shareable activities

When you’re working with a Continuous Delivery workflow, you rely on building and deploying your websites in such a way that any improvements can be released into production any time. Identifying and fixing failures quickly is key to enabling rapid development cycles. But what happens when you’re looking into a failed build step, with no clue as to how to address it? You can now share links to specific lines within the activity logs.

Estimating Your Cloud Costs is EASY. Do it in Just 3 Clicks.

One of our customers recently got their first bill after moving their Linux and Windows workloads to Azure. Their bill was astronomical! They struggled to answer the question, “how much will it cost?” and their initial cost assessments were vague at best. Here’s what they did.

Monitor the Azure Cosmos DB integrated cache with Datadog

Azure Cosmos DB is a fully managed NoSQL database that scales automatically with load and supports multiple APIs. This makes it easy to incorporate with your applications while removing the need to maintain your own database servers. The Cosmos DB integrated cache—which is now in public preview—is a new offering that can help reduce costs and improve performance for Azure Cosmos DB.

Sponsored Post

To Reinvent SOAR, Automation Is only a Feature

Security, by its very nature, is one of the most innovative fields on the planet. Every technological advancement carries with it a handful or more of new attack vectors, which in turn lead to a dizzying amount of security innovation as our industry works to mitigate risk and defend against threats. But for all this innovation, there are a few ways in which security lags far behind.

What Value Does a Cloud Data Platform Hold For Your Business?

It has been roughly two decades since cloud computing first appeared on the scene, and yet, despite overwhelming evidence of the business operational productivity improvements, cost-savings, and competitive advantages it provides, a significant remnant of the banking industry remains open without using it.

How to Delete Pods from a Kubernetes Node

When administering your Kubernetes cluster, you will likely run into a situation where you need to delete pods from one of your nodes. You may need to debug issues with the node itself, upgrade the node, or simply scale down your cluster. Deleting pods from a node is not very difficult, however there are specific steps you should take to minimize disruption for your application.