Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Logic App Best Practices, Tips, and Tricks: #30 How to validate if a JSON structure is an Array or a single object

In the last two posts, we addressed validating whether a string or an array was null or empty. Today we will continue on the same topic, validations, and I will speak about another good Best practice, Tips, and Tricks that you must consider while designing your business processes (Logic Apps): How to validate if a JSON structure is an Array or a single object.

How Honeycomb Monitors Kubernetes

While Kubernetes comes with a number of benefits, it’s yet another piece of infrastructure that needs to be managed. Here, I’ll talk about three interesting ways that Honeycomb uses Honeycomb to get insight into our Kubernetes clusters. It’s worth calling out that we at Honeycomb use Amazon EKS to manage the control plane of our cluster, so this document will focus on monitoring Kubernetes as a consumer of a managed service.

SD-WAN: Monitoring Blind Spots, and What to Do About Them

The adoption of software-defined wide area network (SD-WAN) technologies continues to pick up pace. By employing SD-WAN technologies, organizations have the potential to realize a range of advantages. Teams can achieve better performance while using lower cost, using commercially-available technologies. For example, teams can use public internet services rather than more expensive private WAN technologies, such as MPLS.

On-call management on the go: Introducing the Grafana OnCall mobile app

We’ve all been there: Sleeping peacefully in bed over the weekend, finally getting rest after a long week at your computer making AI-generated memes writing code. Then at 3 a.m., your phone makes an ungodly sound, and you wake up startled, frazzled, and confused. When you finally type in your passcode to unlock your phone (because facial recognition doesn’t register your bleary-eyed, squinty face), you see an alert, and all dreams of sleep are over.

Optimize Industrial IoT Data with InfluxDB and AWS

The modern factory’s relationship with data is experiencing a major change. Data now shapes the future rather than only telling the story of the past. The language inside the factory sounds like higher Overall Equipment Effectiveness (OEE) as the result of a shift from preventive to predictive maintenance. It could also look like expanding business goals to a new market based on impactful data-driven decisions. A change in purpose requires an update in technology.

What is MTTR? Calculation and Reduction Strategies

In the fast-paced world of software development, every minute counts. When disruptions occur, whether there are minor or major system failures, organizations need to bounce back to maintain seamless operations. That's where MTTR (Mean Time to Repair) steps onto the stage as a game-changing metric. Are you ready to unlock the secrets behind reducing downtime, boosting performance, and ensuring software reliability?

Introduction to Sysdig Monitor

Welcome to our comprehensive YouTube series on Sysdig Monitor, where we dive deep into the world of container monitoring and observability. Join us as we explore the advanced features, practical use cases, and expert insights that Sysdig Monitor brings to the table, empowering you to gain unparalleled visibility into your infrastructure and enhance your operational efficiency. Whether you're a seasoned Sysdig user or new to the platform, these videos will equip you with the knowledge and skills to maximize the potential of your monitoring strategy.

We can now notify you through PagerDuty

When we detect a problem with your site, we can notify you via mail, a Slack message, a webhook, or any of our other notifications channels. This is enough for most of our users, but those who work in larger teams often need more flexibility. Today, we are launching our PagerDuty integration. PagerDuty is a cloud-based incident management platform that helps organizations improve operational reliability by providing real-time alerts, on-call scheduling, and incident tracking.

Our redesigned status pages can now show uptime history

Next to the many checks we can perform, we can also render beautiful status pages to inform your audience about the health of your service. Today, we've deployed a redesign of these status pages. In this iteration, everything is more polished. We picked a new font and colors and added some icons to make the status page a bit more visually interesting. In addition to the cosmetic upgrade, we also added a significant new feature. We can now display 60 days of uptime history for your sites.