Monthly Archive

New StackPod Episode: Implementing an SRE Practice with Yousef Sedky of Axiom/Hyke

Mar 31, 2022 By Annerieke Kortier In StackState

For our latest StackPod episode, we invited Hyke’s DevOps team lead and AWS Cloud architect: Yousef Sedky. Axiom Telecom is one of the largest telephone retailers in the United Arab Emirates and Saudi Arabia and Hyke, its sister company, is a distribution platform for mobile products.

Read Post

StackState

Read more about New StackPod Episode: Implementing an SRE Practice with Yousef Sedky of Axiom/Hyke

SRE vs. Platform Engineering: The Key Differences, Explained

Mar 29, 2022 By JP Cheung In Rootly

Site Reliability Engineering (SRE) teams and Platform Engineering teams share similar goals -- like maximizing automation and reducing toil -- and similar methodologies. But they have different priorities, and use somewhat different tools to achieve them. What are SREs, what are platform engineers and how is each role similar and different? This article explains.

Read Post

Rootly

Read more about SRE vs. Platform Engineering: The Key Differences, Explained

How important is Observability for SRE?

Mar 27, 2022 By Ricardo Castro In Squadcast

Observability is what defines a strong SRE team. In this blog, we have covered the importance of observability, and how SREs can leverage it to enhance their business. Observability is the practice of assessing a system's internal state by observing its external outputs. Through instrumentation, systems can provide telemetry such as metrics, traces, and logs that help organizations better understand, debug, maintain and evolve their platforms.

Read Post

Squadcast

Read more about How important is Observability for SRE?

Rundeck + Squadcast Integration: Simplifying Alert Routing

Mar 25, 2022 By Vishal Padghan In Squadcast

Rundeck is an automation tool that helps to make existing automation, scripts, and commands more secure, auditable, and easier to run. It is a software Job scheduler and Run Book Automation system that automates routine processes across development and production environments. It brings together tasks scheduling, multi-node command execution, workflow orchestration. It also logs everything that happens in the system. Squadcast is an end-to-end incident response tool.

Read Post

Squadcast

Read more about Rundeck + Squadcast Integration: Simplifying Alert Routing

SolarWinds Orion + Squadcast: Alert Routing Made Easy

Mar 24, 2022 By Vishal Padghan In Squadcast

SolarWinds Orion is a scalable infrastructure monitoring and management platform. It is designed to simplify IT administration for on-premises, hybrid, and software as a service (SaaS) environments, in a single pane of glass. SolarWinds Orion ensures you do not have to struggle with numerous incompatible point monitoring products, as it consolidates the full suite of monitoring capabilities into one platform with cross-stack integrated functionality. Squadcast is an end-to-end incident response tool.

Read Post

Squadcast

Read more about SolarWinds Orion + Squadcast: Alert Routing Made Easy

What Is Site Reliability Engineering (SRE)? The SRE Role Explained

Mar 22, 2022 By Joey D'Antoni In SolarWinds

Historically, there was a clear delineation between what system administrators (SysAdmins) do and what application developers are responsible for in IT organizations. In recent years—especially in organizations focused on software development—these worlds have come together as IT operations and development teams adopt DevOps practices. The concept of site reliability engineering (SRE) was first introduced by a much-discussed book titled Site Reliability Engineering from Google.

Read Post

SolarWinds

Read more about What Is Site Reliability Engineering (SRE)? The SRE Role Explained

SRE Revisited: SLO in the age of Microservices

Mar 18, 2022 By Dotan Horovits In logz.io

Site Reliability Engineering (SRE) practice was established by Google nearly 20 years ago, and was popularized with Google’s monumental SRE Book. Everyone’s been attempting to follow that iconic path ever since.

Read Post

logz.io

Read more about SRE Revisited: SLO in the age of Microservices

Honeycomb + Squadcast Integration: Routing Incident Alerts Made Easy

Mar 18, 2022 By Vishal Padghan In Squadcast

Honeycomb is an application monitoring tool that helps DevOps and SRE teams to operate more efficiently by offering rich observability solutions and intuitive team collaboration. It helps understand complex relationships within your distributed systems and troubleshoot issues accordingly. Squadcast is an end-to-end incident response tool. Built with an SRE mindset, it streamlines all the incident response activities.

Read Post

Squadcast

Read more about Honeycomb + Squadcast Integration: Routing Incident Alerts Made Easy

SRE Metrics: Four Golden Signals of Monitoring

Mar 18, 2022 By Stephen Watts In Splunk

SRE (site reliability engineering) is a discipline used by software engineering and IT teams to proactively build and maintain more reliable services. SRE is a functional way to apply software development solutions to IT operations problems. From IT monitoring to software delivery to incident response – site reliability engineers are focused on building and monitoring anything in production that improves service resiliency without harming development speed.

Read Post

Splunk

Read more about SRE Metrics: Four Golden Signals of Monitoring

DevOps vs SRE - Reducing Technical Debt and Increasing Efficiency and Resiliency

Mar 18, 2022 By Ravi Lachhman In Shipa

One more blog topic stemming from our weekly office hours that we hold with the field team here at Shipa. In our last office hours, was asked a question about “what are the difference between DevOps Engineers and SREs?”. Both professions are emerging disciplines and cultures that continue to evolve and play an importance in technology organizations. I’ve been fortunate to have written and spoken about this before; though taking a fresh look at what the two domains try to accomplish.

Read Post

Shipa

Read more about DevOps vs SRE - Reducing Technical Debt and Increasing Efficiency and Resiliency

Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Mar 17, 2022 By Vishal Padghan In Squadcast

Salesforce Cloud is one of the leading cloud-based customer relationship management (CRM) solutions. It provides a shared view of your customers and their relationship with the business. With Salesforce Cloud, users can automate service processes and streamline workflows. Squadcast is an end-to-end incident response tool. Built with an SRE mindset, it streamlines all the incident response activities. Squadcast aligns your teams towards a common organizational goal of better reliability.

Read Post

Squadcast

Read more about Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Observability for SRE & DevOps Engineer

Mar 16, 2022 By Amartya Gupta In Motadata

Software developments take place quickly as per the client’s requirements. The developments need to take place with safety and precautions. DevOps engineers can help into this matter; however, it is not possible without Observability.

Read Post

Motadata

Read more about Observability for SRE & DevOps Engineer

How to Implement Global View and High Availability for Prometheus

Mar 11, 2022 By Ricardo Castro In Squadcast

Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting.

Read Post

Squadcast

Read more about How to Implement Global View and High Availability for Prometheus

What Does AIOps Mean for SREs? It's Complicated.

Mar 11, 2022 By JJ Tang In Rootly

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?

Read Post

Rootly

Read more about What Does AIOps Mean for SREs? It's Complicated.

AppScope 1.0: Changing the Game for SREs and Devs

Mar 8, 2022 By The AppScope Team In Cribl

SREs and Devs are used to solving problems even when an awkward or inefficient way is the only way. In AppScope 1.0, SREs and Devs have a new alternative to standard methods, that the AppScope team thinks will make that problem-solving a lot more fun. We in the AppScope team constantly hear firsthand about life in the SRE trenches. For this blog, we “interview” a fictional SRE/Dev whose thoughts and comments are a mash-up of things we’ve heard from real people we know.

Read Post

Cribl

Read more about AppScope 1.0: Changing the Game for SREs and Devs

ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

Mar 4, 2022 By Nir Sharma In Squadcast

ServiceNow is a workflow automation platform used by organizations for their IT ticketing and project management needs. In contrast, Squadcast is an end-to-end incident management and SRE platform that is used by organizations for their reliability requirements.

Read Post

Squadcast

Read more about ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

What SREs Can Learn from Capt. Sully: When to Follow Playbooks

Mar 4, 2022 By Andre King In Rootly

When are you smarter than your playbooks, and when are your playbooks smarter than you? That’s a question that engineers rarely step back to consider. The rational, disciplined parts of our minds tell us that the playbooks we are supposed to follow were carefully designed and tested, and that we should stick to them at all costs.

Read Post

Rootly

Read more about What SREs Can Learn from Capt. Sully: When to Follow Playbooks

Golden Signals - Monitoring from first principles

Mar 2, 2022 By Safeer CM In Squadcast

Building a successful monitoring process for your application is essential for high availability. In the first of this three-part blog series, Safeer discusses the four key SRE Golden Signals for metrics-driven measurement, and the role it plays in the overall context of Monitoring. Monitoring is the cornerstone of operating any software system or application effectively. The more visibility you have into the software and hardware systems, the better you are at serving your customers. It tells you whether you are on the right track and, if not, by how much you are missing the mark.

Read Post

Squadcast

Read more about Golden Signals - Monitoring from first principles

Kubernetes Health Check Using Probes

Mar 2, 2022 By Squadcast Community In Squadcast

Kubernetes is an open source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes.

Read Post

Squadcast

Read more about Kubernetes Health Check Using Probes

Site Reliability Chats (Mar 2, 2022)

Mar 2, 2022 By Gremlin In Gremlin

Welcome to the first episode of Site Reliability Chats with your hosts Jason Yee @gitbisect and Julie Gunderson @julie_gund.

View Video

Gremlin

Read more about Site Reliability Chats (Mar 2, 2022)

Operations | Monitoring | ITSM | DevOps | Cloud

New StackPod Episode: Implementing an SRE Practice with Yousef Sedky of Axiom/Hyke

SRE vs. Platform Engineering: The Key Differences, Explained

How important is Observability for SRE?

Rundeck + Squadcast Integration: Simplifying Alert Routing

SolarWinds Orion + Squadcast: Alert Routing Made Easy

What Is Site Reliability Engineering (SRE)? The SRE Role Explained

SRE Revisited: SLO in the age of Microservices

Honeycomb + Squadcast Integration: Routing Incident Alerts Made Easy

SRE Metrics: Four Golden Signals of Monitoring

DevOps vs SRE - Reducing Technical Debt and Increasing Efficiency and Resiliency

Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Observability for SRE & DevOps Engineer

How to Implement Global View and High Availability for Prometheus

What Does AIOps Mean for SREs? It's Complicated.

AppScope 1.0: Changing the Game for SREs and Devs

ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

What SREs Can Learn from Capt. Sully: When to Follow Playbooks

Golden Signals - Monitoring from first principles

Kubernetes Health Check Using Probes

Site Reliability Chats (Mar 2, 2022)

Monthly Archive

Follow Us