Operations | Monitoring | ITSM | DevOps | Cloud

Automate insights-rich incident summaries with generative AI

Does this sound familiar? The incident has just been resolved and management is putting on a lot of pressure. They want to understand what happened and why. Now. They want to make sure customers and internal stakeholders get updated about what happened and how it was resolved. ASAP. But putting together all the needed information about the why, how, when, and who, can take weeks. Still, people are calling and writing. Nonstop.

Ensuring Robust Security in Office 365

In an era where digital threats are evolving rapidly, securing your Office 365 environment has never been more crucial. Office 365, a suite known for its robust productivity tools, also demands a proactive approach to security. This blog post delves into essential practices and strategies to fortify your Office 365 setup against various cyber threats.

Introducing Cortex Eng Intelligence

Engineering teams rely on certain metrics to assess their ability to deliver quality products, on time. This is a useful exercise, but execution has been lacking—with metric collation often handled via spreadsheet, or stand-alone tool. Neither approach is ideal for two reasons: 1) How—or more specifically where—metrics are collected silos them away from business context.

Build Operational Resilience with Generative AI and Automation

For modern enterprises aiming to innovate faster, gain efficiency, and mitigate the risk of failure, operational resilience has become a key competitive differentiator. But growing complexity, noisy systems, and siloed infrastructure have created fragility in today’s IT operations, making the task of building resilient operations increasingly challenging.

Cloud Observer: Subsea Cable Maintenance Impacts Cloud Connectivity

In this edition of the Cloud Observer, we dig into the impacts of recent submarine cable maintenance on intercontinental cloud connectivity and call for the greater transparency from the submarine cable industry about incidents which cause critical cables to be taken out of service.

How to calculate the difference of a value over time with InfluxDB and Grafana

Learning about the past helps us understand the present, and even predict the future. So, whether you are monitoring CPU usage or how long your IoT device was powered on and then off, at some point, you might want to know the difference of a value over time. InfluxDB is an open source database for storing and retrieving time series data. Thanks to its own query languages — flux and InfluxQL — it provides different and powerful ways to analyze data.

Engineering nits: Building a Storybook for Slack Block Kit

We care a lot about the pace of shipping at incident.io: moving fast is a fundamental part of our company culture, and out-pacing your competition is one of the best ways we know to win. In engineering teams, one way to ship fast is to invest in tools that make your team more productive. We've become good at identifying small pains and frustrations that slow us down over time and – after surfacing them to the rest of the team – find solutions for them.

How to Implement FinOps Successfully

This is the fifth and final part of this FinOps series, The Operate Phase. If you have missed any of my previous blogs, here is a list of posts in the series: Note: I am ex-AWS, so you will notice a lot more focus on AWS tools and services as examples here, however we are cloud agnostic and all cloud providers have similar services and tools.