Operations | Monitoring | ITSM | DevOps | Cloud

Cortex

The Pocket Guide to Production Readiness (plus bonus framework!)

Faster software development cycles means greater reward, and greater risk. Organizations that lose sight of continuous alignment to standards risk delayed launches, higher risk of churn, and higher costs—not to mention unproductive and unhappy developers. Building a strong production readiness review process can help, but existing tools and frameworks haven’t made it easy to keep up to date with the increased velocity at which software is expected to ship.

Microservices Catalog: Definition, Use Cases & Benefits

When speed to market can make or break a business, the move from monoliths to microservices has become an obvious choice for many engineering teams. This transformation promises agility, scalability, and the ability to more closely align with business functions. It is why we see organizations moving from the rigidity and restrictions of monoliths to the flexibility and control associated with microservices architectures.

What's new in on-call best practice?

Already a quarter of the way into 2024, we’re seeing a lot of shake-up in on-call best practices. We’re excited to see AI in the mix, but we’re also seeing a renewed focus on existing and neglected best practices. Some current topics in on-call best practices include: In this article, we’ll review some best practices and explore the 2024 trends.

Software quality metrics developers should track (and how to do it)

It's been a decade since Marc Andreessen declared that software is eating the world, and it is still hungry. Customers expect software solutions for every need, driving digital transformation in every analogue industry. Software quality is now fundamental to company reputation, directly affecting customer satisfaction, brand and overall business success.

Beyond Microservices: Miniservices, Macroservices, and the in between

Containerized microservices have been the gold standard for cloud computing since they replaced the monolith architecture over a decade ago. The flexibility, scalability, and velocity they enable for teams make them an obvious choice. Yet, a strict interpretation of one service for one function doesn’t quite serve everyone, especially when architectures get large. We’ll discuss how flexibility in service architecture might be the way to go.

Why and how to use site reliability golden signals

Software complexity makes it harder for teams to rapidly identify and resolve issues. IT service management has evolved from an afterthought to a central part of DevOps. Microservices architectures are prone to delay or missed identification of such concerns. Monitoring mechanisms need to keep up with these complex infrastructures. Maintaining reliability and performance while harnessing this complexity requires a considered, data-driven approach.

Generative AI and developer experience

From its initial appearance in the dev-tools space, GenAI has had an outsized impact on how developers approach day-to-day tasks (just ask any developer about when they first started using GitHub’s copilot). While any risks are still being evaluated—like potential for introducing anti-patterns or inadvertently running afoul of compliance requirements, many engineering teams have successfully implemented GenAI with measurable gains in collaboration and productivity.

What is developer experience?

Companies obsess over end user experience, whether it is Amazon’s customer-centric innovation or Steve Jobs suggesting starting with the customer experience and working backwards to technology. But as our world becomes more knowledge-based and digital, we also need to consider the most important stakeholder on the payroll - software engineers.

Observability tools and Internal Developer Portals

Observability tools help engineering teams understand the health and behavior of software. But the term “health” in the context of this type of tooling is fairly narrow in scope—pertaining to real-time performance, reliability, and availability. While these are three important metrics to monitor, they’re lagging indicators of bigger issues happening upstream.