Operations | Monitoring | ITSM | DevOps | Cloud

Dash 2021: Guide to Datadog's newest announcements

Today at Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights.

Announcing the Gremlin Chaos Engineering Professional Certificate Program

There’s a reason why thousands of Engineers, Testers, and other Reliability specialists signed up for Gremlin’s first Gremlin Certified Chaos Engineering Practitioner (GCCEP) certificate program: Chaos Engineering is in high demand, and the market is looking for professionals who know how to wield it well.

The Nightmare Before Business: Stay Safe with Uptime.com Status Pages

We’re nearing Halloween and mischief night has stolen tricks from the holiday season. With online sales alone expected to creep up toward $3 billion before the next crescent moon, we’re offering you a solution to keep the angry mobs with pitchforks at bay by giving them a crystal ball into your real-time incident response with Uptime.com Status Pages.

VMworld 2021: "We're Proud To Announce..."

I've never seen so much news during VMworld! It began to seem comical that every speaker at the opening "General Session" and subsequent keynotes used the line "We are proud to announce." By the way, one of the most excellent General Sessions I've ever seen in terms of tempo, delivery, and rhetoric! From October 15, you will be able to find all content on-demand here.

How We Use Sloth to do SLO Monitoring and Alerting with Prometheus

One of the most challenging tasks for Site Reliability Engineers is to align the reliability of the systems with the business goals. There is a constant battle between delivering more features—which increases the product’s value—and keeping the system reliable and maintainable. A significant ally to achieve both objectives is the Service Level Objective Framework.

Differences between Site Reliability Engineer Vs. Software Engineer Vs. Cloud Engineer Vs. DevOps Engineer

The evolution of Software Engineering over the last decade has lead to the emergence of numerous job roles. So how different is a Software Engineer, DevOps Engineer, Site Reliability Engineer and a Cloud Engineer from each other? In this blog, we drill down and compare the differences between these roles and their functions.

Introducing Test Insights with flaky test detection

The CircleCI Insights dashboard was designed to help you improve your delivery efficiency. We launched the dashboard a year ago to provide teams with actionable data for optimizing your pipelines. Since then, we’ve been listening to your feedback. By far, the most requested functionality is the ability to gain further visibility into test performance.

7 JSON Logging Tips That You Can Implement

When teams begin to analyze their logs, they almost immediately run into a problem and they’ll need some JSON logging tips to overcome them. Logs are naturally unstructured. This means that if you want to visualize or analyze your logs, you are forced to deal with many potential variations. You can eliminate this problem by logging out invalid JSON and setting the foundation for log-driven observability across your applications.

What's new in Grafana Cloud for October 2021: Machine Learning, Grafana 8.2, new integrations, and more

Here at Grafana Labs, we’re constantly shipping new features to help our users get the most out of Grafana Cloud. To help our new and existing customers learn about the latest and greatest, here’s a roundup of all the new features and improvements you should know about to make the most of Grafana Cloud.