February 2022

Quickly troubleshoot application errors with Error Reporting

Feb 28, 2022 By Eyamba Ita In Google Operations

Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.

Read Post

Google Operations

Read more about Quickly troubleshoot application errors with Error Reporting

Traditional vs Modern Incident Response

Feb 24, 2022 By Kristijan Mitevski In Squadcast

An incident is an event (network outage, system failure, data breach, etc.) that can lead to loss of, or disruption to, an organization's operations, services or functions. Incident Response is an organization’s effort to detect, analyze and correct the hazards caused due to an incident. In the most common cases, when an incident response is mentioned, it usually relates to security incidents. Sometimes incident response and incident management are more or less used interchangeably.

Read Post

Squadcast

Read more about Traditional vs Modern Incident Response

SRE Tools (All of the Tools Your Team Needs)

Feb 24, 2022 By Myra Nizami In Blameless

Wondering about SRE Tools? We explain the best tools for every step of the SRE development process.

Read Post

Blameless

Read more about SRE Tools (All of the Tools Your Team Needs)

Shift Left Reliability Meetup February - Retooling your toolkit

Feb 24, 2022 By Reliably In Reliably

Security and reliability have a lot in common. So much in fact, that the tools used for one are often well suited for the other. The only thing you need is the right mindset. In this talk Mika Boström will go over the principles, ideas and share real world examples. You may realise you've been doing both already.

View Video

Reliably

DevOps
SRE

Read more about Shift Left Reliability Meetup February - Retooling your toolkit

Shift Left Reliability Meetup February - Implementing reliability for a post-pandemic future

Feb 24, 2022 By Reliably In Reliably

Steve Wade will talk about his experiences to date empowering developers and importantly the wider business to care about the reliability of the applications they provide to customers. He will discuss the pillars that make up reliability, along with his hypothesis and results on implementing each of them. Steve will tap into his experiences working in a range of sectors including financial services on how he made companies make changes pre-pandemic, as well as the additional challenges organisations face in the future post pandemic. Steve aims to make sure attendees leave with a toolkit of ideas as well as lessons learnt so you don't make the same assumptions he did.

View Video

Reliably

DevOps
SRE

Read more about Shift Left Reliability Meetup February - Implementing reliability for a post-pandemic future

Incident Management Metrics | Choosing KPIs that Matter

Feb 22, 2022 By Noor-ul-Anam Ruqayya In Blameless

Wondering about incident management metrics? We explain what incident management metrics are, how to track them, and what to do with the information.

Read Post

Blameless

Read more about Incident Management Metrics | Choosing KPIs that Matter

Service Level Objectives: Where do we start?

Feb 22, 2022 By Last9 In Last9

Most of us have heard about SLOs and what they mean but always found it hard to start adopting them across our teams. This video is a way to demystify the journey of adoption of SLOs, with examples of how several large companies like Disney adopted them. Whether you are new to the DevOps/SRE world or an experienced developer, you will learn a fresh approach to making software more reliable!

View Video

Last9

Read more about Service Level Objectives: Where do we start?

Everything you need to know about Squadcast and Microsoft Teams Integration

Feb 21, 2022 By Vishal Padghan In Squadcast

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.

Read Post

Squadcast

Read more about Everything you need to know about Squadcast and Microsoft Teams Integration

Top 13 Site Reliability Engineer (SRE) Tools

Feb 20, 2022 By Jacob Hall In Dotcom-Monitor

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization, and as such, so do site reliability engineer tools. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities.

Read Post

Dotcom-Monitor

Read more about Top 13 Site Reliability Engineer (SRE) Tools

Why and How SREs Can Benefit from Feature Flags

Feb 17, 2022 By Weihan Li In Rootly

When you think of who uses feature flags, your mind most likely goes to developers. In general, feature flags are closely associated with software engineering. But Site Reliability Engineers, too, can benefit from feature flags. SREs may not be the ones to create feature flags, but they should work closely with developers to ensure that the applications their teams support include feature flags.

Read Post

Rootly

Read more about Why and How SREs Can Benefit from Feature Flags

Incident Response Team | Roles & Responsibilities Defined

Feb 17, 2022 By Myra Nizami In Blameless

We discuss what an incident response team does, how it is structured, and how to form the best one for your organization.

Read Post

Blameless

Read more about Incident Response Team | Roles & Responsibilities Defined

Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Feb 15, 2022 By Caleb Munyasya In Squadcast

The world is still getting used to operating within the cloud. Moving to the cloud is challenging for many organizations. So why do we see a rise in the adoption of multicloud strategies? In this blog, we will explore why this trend is worth considering for your organization, as well as look at the challenges that it brings.

Read Post

Squadcast

Read more about Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Best Practices To Build & Manage a Strong DevOps Team

Feb 15, 2022 By Emily Arnott In Blameless

Looking to build or improve your DevOps Team? We will explain the roles and responsibilities of a DevOps Team within your organization, and how to start building one. What does a DevOps Team do? A DevOps team is made of professionals from development and operations that work closely together. These cross-functional teams are responsible for orchestrating the entire software development process.

Read Post

Blameless

Read more about Best Practices To Build & Manage a Strong DevOps Team

How We Define SRE Work

Feb 15, 2022 By Fred Hebert In Honeycomb

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared my initial experiences and impressions in this post and thought it would make sense to check back in now that I’ve had the opportunity to spend time learning about the team, the culture, and the code base more in depth.

Read Post

Honeycomb

Read more about How We Define SRE Work

What is a Runbook And How Can It Help My Team

Feb 11, 2022 By Myra Nizami In Blameless

Wondering what a runbook is? We explain what a runbook is, common tasks a runbook can help with, and how to create one.

Read Post

Blameless

Read more about What is a Runbook And How Can It Help My Team

Top 9 Skills for SREs from ex-Instacart SRE

Feb 10, 2022 By Quentin Rousseau In Rootly

A list of the top nine SRE skills, from incident management, to cloud computing, to networking and beyond.

Read Post

Rootly

Read more about Top 9 Skills for SREs from ex-Instacart SRE

Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

Feb 9, 2022 By Squadcast Community In Squadcast

We are thrilled to announce that G2 has recognized Squadcast as a High Performer in the Incident Management space and rated us as one of the Best Software for IT Management Products. Over the last three years, G2 has acknowledged our impact in the IT Incident Management space, which led to us being recognized as a Momentum Leader in the Incident Management and IT Alerting categories. Thanks to our learnings from customer feedback, we have been able to shape our product vision and grow further.

Read Post

Squadcast

Read more about Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

6 Software Reliability Metrics That Matter

Feb 7, 2022 By Myra Nizami In Blameless

Wondering about software reliability metrics? We explain the important metrics you need to track.

Read Post

Blameless

Read more about 6 Software Reliability Metrics That Matter

Importance of Good Incident Communication

Feb 4, 2022 By Michael Marchese - SRE Manager In Rootly

From alerting to during to post incident, great communication is the key to effective incident response.

Read Post

Rootly

Read more about Importance of Good Incident Communication

10 Website Performance Statistics Every SRE Should Know For 2022

Feb 3, 2022 By Maddie Welsh In uptime

Two major shifts are simultaneously taking place in the world of website monitoring: the acceleration of digital dependence has increased the need for high-performing websites and the frequency (and severity) of downtime outages continues to climb. These shifts have made it more important than ever for businesses of all sizes and industries to monitor uptime and page speed.

Read Post

uptime

Read more about 10 Website Performance Statistics Every SRE Should Know For 2022

SRE: How the role is evolving

Feb 3, 2022 By Kevin Goldberg and Michael Baldani In Sumo Logic

The growth of site reliability engineering (SRE) has demonstrated the need for SRE implementations is here to stay for the foreseeable future. LinkedIn voted SRE jobs as the second most promising positions in the US in 2019, and now as we head into 2022, you can be sure to see the evolution of SRE continue to grow and expand. Below, we’ll get into what SRE is, what SRE engineers do, and how SRE will continue to evolve into the future.

Read Post

Sumo Logic

Read more about SRE: How the role is evolving

Tanzu Talk: Salsa Reliability Engineering, or, "You Build It, You Run It" Revisited

Feb 1, 2022 By VMware Tanzu In VMware Tanzu

And, of course, VMware Tanzu's appdev stack has you covered here!

View Video

VMware Tanzu

Read more about Tanzu Talk: Salsa Reliability Engineering, or, "You Build It, You Run It" Revisited

SRE Best Practices For Successful Teams

Feb 1, 2022 By Myra Nizami In Blameless

Wondering about SRE best practices? If you are trying to improve and streamline your current process, we explain best practices and tips for implementing them. What are SRE best practices?

Read Post

Blameless

Read more about SRE Best Practices For Successful Teams

What Is a System Administrator? A Complete Guide to SysAdmin Roles and Responsibilities

Feb 1, 2022 By Joey D'Antoni In SolarWinds

System Administrators (SysAdmins) often represent the core of IT organizations. SysAdmins manage the organization's computing infrastructure, encompassing servers, virtualization, networking, and storage. For many years, the term System Administrator, or SysAdmin, was typically associated with Linux or UNIX systems.

Read Post

SolarWinds

Read more about What Is a System Administrator? A Complete Guide to SysAdmin Roles and Responsibilities

Operations | Monitoring | ITSM | DevOps | Cloud

February 2022

Quickly troubleshoot application errors with Error Reporting

Traditional vs Modern Incident Response

SRE Tools (All of the Tools Your Team Needs)

Shift Left Reliability Meetup February - Retooling your toolkit

Shift Left Reliability Meetup February - Implementing reliability for a post-pandemic future

Incident Management Metrics | Choosing KPIs that Matter

Service Level Objectives: Where do we start?

Everything you need to know about Squadcast and Microsoft Teams Integration

Top 13 Site Reliability Engineer (SRE) Tools

Why and How SREs Can Benefit from Feature Flags

Incident Response Team | Roles & Responsibilities Defined

Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Best Practices To Build & Manage a Strong DevOps Team

How We Define SRE Work

What is a Runbook And How Can It Help My Team

Top 9 Skills for SREs from ex-Instacart SRE

Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

6 Software Reliability Metrics That Matter

Importance of Good Incident Communication

10 Website Performance Statistics Every SRE Should Know For 2022

SRE: How the role is evolving

Tanzu Talk: Salsa Reliability Engineering, or, "You Build It, You Run It" Revisited

SRE Best Practices For Successful Teams

What Is a System Administrator? A Complete Guide to SysAdmin Roles and Responsibilities

Monthly Archive

Follow Us