Operations | Monitoring | ITSM | DevOps | Cloud

Zenduty

Brand new: Zenduty's On-call Schedules Revamp | Hear it from the team who built it

We're about to drop a major revamp to one of your most used Zenduty features. Get ready to experience scheduling like never before! Join our YouTube Premiere Live to see the new on-call schedules that'll make your on-call life smoother and better! P.S. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Bob Lee - Lead DevOps Engineer at Twingate

I was out there in sunny Austin this February, speaking at Civo Navigate 2024. The event was jam packed with amazing talks, and it was great meeting so many people with long and fascinating careers in engineering and Site Reliability. I had the privilege of meeting Bob Lee, who currently leads DevOps at Twingate — a cloud-based service that provides secured remote access, and poised to replace VPNs.

Strategies for Scaling Systems Reliably by Bob Lee

I was out there in sunny Austin this February, speaking at Civo Navigate 2024. The event was jam packed with amazing talks, and it was great meeting so many people with long and fascinating careers in engineering and Site Reliability. I had the privilege of meeting Bob Lee, who currently leads DevOps at Twingate — a cloud-based service that provides secured remote access, and poised to replace VPNs.

IT Incidents and the Role of Incident Response Teams (IRTs)

The digital world comes with advantages and inherent risks. These IT incidents, which can encompass cyberattacks, system outages, and data breaches, can have a devastating impact. Beyond financial losses, IT incidents disrupt business operations, damage reputations, and erode customer trust. During an outage, having a well-prepared Incident Response Team (IRT) is essential to reduce downtime and improve response times.

Trade-off Between Reliability and Feature Velocity

The pressure to constantly innovate and release new features can often clash with the need for a stable and reliable product. While there might be some temporary cutbacks in testing time to achieve high feature velocity, ensuring reliability doesn't have to be an afterthought. We reached out to industry experts to gather their insights on ensuring reliability during phases that demand high feature velocity. Here's what they had to say.

How Do You Handle Third-Party Dependencies in Your Reliability Planning?

External dependencies and third-party services play a crucial role in powering modern applications. These components bring a wealth of benefits, ranging from access to specialized tools and resources to the ability to offload non-core tasks, allowing development teams to focus on delivering value-added features.

Site reliability truth bombs by Piyush Verma (CTO & Co-founder at Last9.io) #shorts #podcast

Dive into an in depth conversation on how software has now become the backbone of things and get access to extraordinary reliability nuggets with Piyush. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

The Show Must Go On - Incidentally Reliable with Piyush Verma (CTO at Last9)

Catch Piyush Verma, Co-Founder and CTO at Last9 in conversation with Ankur Rawal, Co-Founder and CTO at Zenduty — discussing what reliability means to the modern consumer, why SREs make excellent decision-makers, and the current state of observability. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty. Zenduty is an advanced incident management platform that gives you greater control and automation over the incident management lifecycle.

Reduce Alert Fatigue and Improve Your Kubernetes Monitoring

Alert fatigue is a state of exhaustion caused by receiving too many alerts. This can happen when the alerts are not actionable, are irrelevant or too frequent. Misconfigurations or configurations with the wrong assumptions or that lack Service-level objectives (SLOs) can have a dual impact, leading to alert fatigue and, more alarmingly, the potential of overlooking critical alerts We spoke with more than 200 teams using Prometheus Alertmanager. Many face alert fatigue from trivial, nonactionable alerts.