Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Webinar Recap: Lessons learned from T-mobile Netherlands' road to zero touch

How close can CSPs come to realizing the zero touch network vision, and what are the best next steps for getting there? To discuss this question Anodot brought together a panel of experts, including Kim Larsen, CTIO of T-Mobile Netherlands; Ira Cohen, co-founder of Anodot and the company’s chief data scientist; Fernando Elizalde, analyst at GSMA Intelligence; and moderator Justin Springham.

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.

Monitoring serverless applications with AWS CloudWatch alarms

Running any application in production assumes reliable monitoring to be in place and serverless applications are no exception. As modern cloud applications get more and more distributed and complex, the challenge of monitoring availability, performance, and cost get increasingly difficult. Unfortunately there isn’t much offered right out of the box from cloud providers.

7 Ways Your Status Page Can Save You

Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.

How to Reduce Alert Fatigue: Preventing Noisy Alerts and Error Messages

Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.

How to Notify Your Team of Errors: Email vs. Slack vs. PagerDuty

Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.

Making ServiceNow better with CloudFabrix RDA

The onset of ServiceNow has relieved the IT Services workforce. With CloudFabrix RDA added to it, we made it even better. Let’s face it that many IT Service transformation implementations take longer because of a lack of automation around migration and production. The efficiency of ITSM is further compromised due to the absence of data automation and enrichment. ServiceNow with Robotic Data Automation stirs a positive impact on three critical areas of data operations ITSM teams.