Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Ask a Site Reliability Engineer (SRE)

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

Auditing Your Automation's Access: Using More Automation

Between CI/CD pipelines, container orchestrators, and developer debugging tools, more and more automation is needed to scale your systems. But how do you know if that automation is accessing the right systems at the right time? And how do you ensure that your automation is safe from exploits by unauthorized users?

The Top 10 Open-Source Products From KubeCon North America 2022

KubeCon is the major cloud-native gathering of thousands of people from around the globe. The event is attended by many emerging startups and companies working on revolutionary products around Kubernetes, security, containers, and DevOps. It is a great opportunity to share insights and collaborate on various community projects.

Mapping service vulnerabilities with Mend

Mend is an automated vulnerability scanning tool that helps teams detect and resolve issues quickly. Mend can discover outdated packages and tell you if you’re relying on tools with known issues. Then, through automated remediation, Mend creates pull requests for developers with specific guidance on resolving those issues. Mend conducts static code analysis as well as package and dependency management analysis to identify weaknesses.

The Quest For Sunken Treasure: Top-Down Vs. Bottom-Up Cloud Cost Allocation

High-quality cloud cost allocation has become an existential issue for businesses. In order to get as much out of their (mounting) cloud investments as possible, business leaders need to know how much they’re spending in the cloud, what/who they’re spending it on, and whether there’s a good reason for it. In its ideal form, cost allocation answers all these questions.

Prometheus vs. Zabbix

For a successful business, you need to introduce an effective monitoring system covering all areas of your business and infrastructure - servers, databases, services, overall traffic, and even revenue collected. The users of this monitoring system can be system administrators, software engineers, information engineers, as well as all sorts of analysts.

Building an incident management process

In this podcast, our panellists discuss the foundations that any team needs to put in place when designing their incident management process. Starting from the basics of defining what we really mean by an incident, to how to set your severity levels, roles and statuses, Chris and Pete share their tips for building solid foundations to run your incidents.

The Power of Harnessing DevOps for the Database

Why do some organizations excel in streamlining their database operations and applications development while others find it immensely challenging? Why can some database teams embrace agility while others take months of cycles to deploy even a single line of code? What secret sauce can allow some database teams to work smarter (not harder), streamline database development lifecycles better, get to deployment faster, and create an overall stronger alignment across departments?

3 questions to ask in the build vs buy debate for incident response tooling

As a former incident responder and now as a responder advocate for FireHydrant, I’ve seen the “build vs. buy” debate play out many times. In fact, I even supported the tool that former employers used for managing incidents for years before they decided to buy (more on that in a future blog post).