Operations | Monitoring | ITSM | DevOps | Cloud

February 2022

Quickly troubleshoot application errors with Error Reporting

Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.

Traditional vs Modern Incident Response

An incident is an event (network outage, system failure, data breach, etc.) that can lead to loss of, or disruption to, an organization's operations, services or functions. Incident Response is an organization’s effort to detect, analyze and correct the hazards caused due to an incident. In the most common cases, when an incident response is mentioned, it usually relates to security incidents. Sometimes incident response and incident management are more or less used interchangeably.

Shift Left Reliability Meetup February - Retooling your toolkit

Security and reliability have a lot in common. So much in fact, that the tools used for one are often well suited for the other. The only thing you need is the right mindset. In this talk Mika Boström will go over the principles, ideas and share real world examples. You may realise you've been doing both already.

Shift Left Reliability Meetup February - Implementing reliability for a post-pandemic future

Steve Wade will talk about his experiences to date empowering developers and importantly the wider business to care about the reliability of the applications they provide to customers. He will discuss the pillars that make up reliability, along with his hypothesis and results on implementing each of them. Steve will tap into his experiences working in a range of sectors including financial services on how he made companies make changes pre-pandemic, as well as the additional challenges organisations face in the future post pandemic. Steve aims to make sure attendees leave with a toolkit of ideas as well as lessons learnt so you don't make the same assumptions he did.

Service Level Objectives: Where do we start?

Most of us have heard about SLOs and what they mean but always found it hard to start adopting them across our teams. This video is a way to demystify the journey of adoption of SLOs, with examples of how several large companies like Disney adopted them. Whether you are new to the DevOps/SRE world or an experienced developer, you will learn a fresh approach to making software more reliable!

Everything you need to know about Squadcast and Microsoft Teams Integration

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.

Top 13 Site Reliability Engineer (SRE) Tools

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization, and as such, so do site reliability engineer tools. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities.

Why and How SREs Can Benefit from Feature Flags

When you think of who uses feature flags, your mind most likely goes to developers. In general, feature flags are closely associated with software engineering. But Site Reliability Engineers, too, can benefit from feature flags. SREs may not be the ones to create feature flags, but they should work closely with developers to ensure that the applications their teams support include feature flags.

Cloud Complexity - Bringing Resources together in Multi-cloud Environments

The world is still getting used to operating within the cloud. Moving to the cloud is challenging for many organizations. So why do we see a rise in the adoption of multicloud strategies? In this blog, we will explore why this trend is worth considering for your organization, as well as look at the challenges that it brings.

Best Practices To Build & Manage a Strong DevOps Team

Looking to build or improve your DevOps Team? We will explain the roles and responsibilities of a DevOps Team within your organization, and how to start building one. What does a DevOps Team do? A DevOps team is made of professionals from development and operations that work closely together. These cross-functional teams are responsible for orchestrating the entire software development process.

How We Define SRE Work

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared my initial experiences and impressions in this post and thought it would make sense to check back in now that I’ve had the opportunity to spend time learning about the team, the culture, and the code base more in depth.

Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

We are thrilled to announce that G2 has recognized Squadcast as a High Performer in the Incident Management space and rated us as one of the Best Software for IT Management Products. Over the last three years, G2 has acknowledged our impact in the IT Incident Management space, which led to us being recognized as a Momentum Leader in the Incident Management and IT Alerting categories. Thanks to our learnings from customer feedback, we have been able to shape our product vision and grow further.

10 Website Performance Statistics Every SRE Should Know For 2022

Two major shifts are simultaneously taking place in the world of website monitoring: the acceleration of digital dependence has increased the need for high-performing websites and the frequency (and severity) of downtime outages continues to climb. These shifts have made it more important than ever for businesses of all sizes and industries to monitor uptime and page speed.

SRE: How the role is evolving

The growth of site reliability engineering (SRE) has demonstrated the need for SRE implementations is here to stay for the foreseeable future. LinkedIn voted SRE jobs as the second most promising positions in the US in 2019, and now as we head into 2022, you can be sure to see the evolution of SRE continue to grow and expand. Below, we’ll get into what SRE is, what SRE engineers do, and how SRE will continue to evolve into the future.

What Is a System Administrator? A Complete Guide to SysAdmin Roles and Responsibilities

System Administrators (SysAdmins) often represent the core of IT organizations. SysAdmins manage the organization's computing infrastructure, encompassing servers, virtualization, networking, and storage. For many years, the term System Administrator, or SysAdmin, was typically associated with Linux or UNIX systems.