Operations | Monitoring | ITSM | DevOps | Cloud

Zenduty

Choosing the Right Metrics for Noiseless K8s Alerting

Watch Ankur Rawal and Dheeraj Reddy talk about how to choose the right metrics for noise K8s alerting, with insights and suggestions based on the mistakes made by hundreds of companies while implementing Prometheus Alertmanager in their production systems, and learn how much bad monitoring could be costing you. This talk was delivered at PromCon'2023 in Berlin.

The Ultimate Guide to DORA Metrics for DevOps

In the world of software delivery, organizations are under constant pressure to improve their performance and deliver high-quality software to their customers. One effective way to measure and optimize software delivery performance is to use the DORA (DevOps Research and Assessment) metrics. DORA metrics, developed by a renowned research team at DORA, provide valuable insights into the effectiveness of an organization's software delivery processes.

Journey from Junior to Senior SRE: Key Insights and Strategies

As Site Reliability Engineering (SRE) continues to grow in popularity, many professionals are looking for ways to advance from junior to senior roles. While there is no one-size-fits-all approach, the transition from junior to senior SRE is marked by a gradual increase in experience and a set of key skills. In this blog, we will explore the valuable insights and strategies shared by experienced SREs.

How to Set Up an IT War Room

IT issues can happen at any time and significantly impact an organization. Hence, it's essential to have a plan to handle these issues quickly and efficiently. And one way to do this is to create an IT war room. An IT war room is a dedicated space for teams to collaborate and resolve issues. Establishing an IT war room enhances an organization's capacity to swiftly and efficiently address IT problems, ultimately reducing their impact on the business.

Observability vs Monitoring: What's the Difference?

Observability and monitoring: These terms are often used interchangeably, but they represent different approaches to understanding and managing IT infrastructure. If you are new to these terms or are often confused between the two, this blog is for you! In this blog, we'll explore the key concepts of observability and monitoring, their evolution in IT operations, their differences and similarities, and their importance in modern infrastructure.

10 Observability Tools in 2023: Features, Market Share and Choose the Right One for You

Understanding what's happening within your systems is a necessity. Have you ever wondered how experts keep an eye on systems to make sure everything's running smoothly? That's where observability tools come in! Observability tools are like helpers that give you a peek inside your tech. In this blog, we will talk about observability tools and how they can be used in different situations so it's easier for you to choose the right one for your organization.

How To Write Incident Postmortems

Writing a public postmortem regarding an outage is essential to maintaining transparency and accountability when things go wrong in a service or system. The purpose of writing a postmortem is to analyze and document an incident or event that has occurred, usually with a focus on identifying its root causes, understanding what went wrong, and outlining steps to prevent similar issues from happening in the future.

Evolution of Site Reliability - Incidentally Reliable with Manoj Sebastian

Catch Manoj Sebastian(ex-Flipkart, Amazon, Atlassian, Intuit, Yahoo) talk about The Evolution of SRE through 20 years, Incident Response and Post Incident Culture at Big Tech and the Future of Reliability with AI ramping up at full speed. The freshest podcast for Site Reliability Engineers, hosted by Vishwa and Shubham from Zenduty.

Understanding Blameless Postmortems

Progress often accompanies unforeseen challenges and mishaps in organizations. Traditionally, these setbacks resulted in pointing fingers, hindering progress, and creating a negative work atmosphere. However, a "Blameless Postmortems" approach transforms how organizations respond to failure. In this blog, we will delve into the importance of cultivating a blameless postrmortem culture when faced with setbacks.

8 Tips to incorporate the voice of the customer in your story grooming/sprint planning

Creating successful products and projects goes beyond just great ideas and flexible processes. It's about truly understanding and listening to your customers.Attentively listening to their wants and needs unlocks invaluable insights that can revolutionize your story planning and project execution. In this blog, we'll look at easy but powerful tips to use the customer's input during story planning.