Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

APImetrics + Squadcast: Routing Alerts Made Easy

APImetrics is an API Compliance, Monitoring and Security solution that lets you make and run API calls or sequences of API calls (workflows) from external, remote cloud locations using exactly the same security configurations as a typical end user would use. If you use APImetrics for API calling requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from APImetrics to the right users in Squadcast.

Observability Pipelines for an SRE

In data management, numerous roles rely on and regularly use observability data. The Site Reliability Engineer is one of these roles. Site Reliability Engineers (SREs) work on the digital frontlines, ensuring performant experiences by using observability data to maintain stability and awareness of software running in various environments across organizations.

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.
Sponsored Post

Outages ITOps professionals are thankful to avoid

As we settle into the time of year when we reflect on what we're thankful for, we tend to focus on important basics such as health, family and friends. But on a professional level, IT operations (ITOps) practitioners are thankful to avoid disastrous outages that can cause confusion, frustration, lost revenue and damaged reputations. The very last thing ITOps, network operations center (NOC) or site reliability engineering (SRE) teams want while eating their turkey and enjoying time with family is to get paged about an outage. These can be extremely costly - $12,913 per minute, in fact, and up to $1.5 million per hour for larger organizations.

Toil: Still Plaguing Engineering Teams

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.