Operations | Monitoring | ITSM | DevOps | Cloud

logz.io

Slack's New Logging Storage Engine Challenges Elasticsearch

Elasticsearch has long been the prominent solution for log management and analytics. Cloud-native and microservices architectures, together with the surge in workload volumes and diversity, have surfaced some challenges for web-scale enterprises such as Slack and Twitter. My podcast guest Suman Karumuri, a Sr. Staff software engineer at Slack, has made a career on solving this problem. In my chat with Suman, he discusses for the first time in a public space a new project from his team at Slack: KalDB.

Survey: Complexity and Costs Threaten Health of Strong DevOps Pulse

Oh how quickly the times change. Over the past 5 years, as Logz.io has executed its annual DevOps Pulse Survey and created the accompanying report, the state of “modern” cloud engineering and monitoring practices have advanced at a torrid pace. As a result, so has the project itself. Back in 2017 when we started polling the industry and collecting data, we sought to validate the usefulness of DevOps itself.

Slack's New Metrics Storage Engine Challenges Prometheus

Metrics storage engines must be specially engineered to accommodate the quirks of metrics time-series data. Prometheus is probably the most popular metrics storage engine today, powering numerous services including our own Logz.io Infrastructure Monitoring. But Prometheus was not enough for Slack given their web-scale operation. They set out to design a new storage engine that can yield 10x more write throughput, and 3x more read throughput than Prometheus! In February 2022 Suman Karumuri, Sr.

Spring4Shell Zero-Day Vulnerability: Overview and Alert Upon Detection for CVE-2022-22965

On March 29, 2022, a critical vulnerability targeting the Spring Java framework was disclosed by VMware. This severe vulnerability is identified as a separate vulnerability inside Spring Core, tracked as CVE-2022-22965 and canonically named “Spring4Shell” or “SpringShell”, leveraging class injection leading to a full remote code execution (RCE).

Who Owns Observability In Enterprises?

It’s common sense. When a logstorm hits, you don’t want to be left scrambling to find the one engineer from each team in your organization that actually understands the logging system – then spending even more time mapping the logging format of each team with the formats of every other team, all before you can begin to respond to the incident at hand. It’s a model that simply won’t scale.

Grok Pattern Examples for Log Parsing

Searching and visualizing logs is next to impossible without log parsing, an underappreciated skill loggers need to read their data. Parsing structures your incoming (unstructured) logs so that there are clear fields and values that the user can search against during investigations, or when setting up dashboards. The most popular log parsing language is Grok. You can use Grok plugins to parse log data in all kinds of log management and analysis tools, including the ELK Stack and Logz.io.

Partner Amplification - Logz.io Achieves AWS Security Competency

We’ve got some outstanding news to share in the arena of security partnerships: Logz.io® Cloud-based SIEM has officially achieved Amazon Web Services (AWS) Security Competency! This designation within the Logging, Monitoring, SIEM, Threat Detection, and Analytics category further demonstrates Logz.io’s proven commitment to delivering best-in-class security.

The Cost of Doing the ELK Stack on Your Own

So, you’ve decided to go with ELK to centralize, manage, and analyze your logs. Wise decision. The ELK Stack is now the world’s most popular log management platform, with millions of downloads per month. The platform’s open source foundation, scalability, speed, and high availability, as well as the huge and ever-growing community of users, are all excellent reasons for this decision.

A Monitoring Reality Check: More of the Same Won't Work

On December 7, 2021, Amazon’s cloud services recently suffered a major outage that not only affected Amazon services, but also many third-party services we use day-to-day, including Netflix, Disney+, Amazon Alexa, Amazon deliveries and Amazon Ring. Causes for the outage, which began at 7:30 am PST and lasted nearly seven hours, were detailed in a Root Cause Analysis report published by AWS that shed light on factors that may have contributed to the extended length of the disruption.