Operations | Monitoring | ITSM | DevOps | Cloud

January 2022

How to Simplify Your Out-of-the-Box Alerting with NEW! AutoDetect

Over 85% of global organizations will be running containerized applications in production by 2025 say Gartner, with 4 in 5 enterprises expected to move their workloads from on-premises infrastructure to the cloud. Migration to the cloud has IT admins and/or SREs managing an increasingly complex, hybrid IT environment, with an uphill battle of trying to monitor and troubleshoot their infrastructure components and services in real time.

Logit.io Featured On eChannelNews For New Partner Program Launch

We are excited to announce that Logit.io has recently been featured on e-ChannelNews where our founder Lee Smith was interviewed by the President of TechnoPlant, Julian Lee about our partner program. In this interview, Lee explains more about how the Logit.io platform can assist channel partners to grow their ability to offer enterprise-ready logging and metrics analysis.

Best Practices in Java Logging for Better Application Logging

Examining Java logs is usually the quickest way to figure out why your application is experiencing trouble, so it's critical to have it in place. Best practices for Java logging can help you troubleshoot and address issues before they affect your users or business. In many circumstances, this entails utilizing a Java logging tool capable of automating your processes and delivering faster and more accurate results than manual logging.

What Should A Board Know About Tech In 2022? | Splunk & Accenture

There is so much happening in the technology space let alone each individual global market, how does an organisation keep up? What trends do they need to keep an eye on and which ones do they need to invest in? We will discuss some of these issues today. Join Brian Berg, Principal Director at Accenture, Blanca Galletero, Splunk’s GVP EMEA GTM Ecosystem and Mark Woods Chief Technical Advisor EMEA at Splunk as they discuss the topic ‘What should a board know about Tech in 2022?’.

What's New at observIQ

You may have noticed a few changes around here. If you explore our new website, you’ll notice new products, expansions to our open source libraries, significant contributions to our favorite open source project, OpenTelemetry, and new integrations with Google Cloud. You might just think we’re taking “new year new me” a little too seriously, but in fact we’ve been planning some of these changes for a long time. It all stems from our firm belief in open source technology.

3 Ways LogStream Can Improve Your Data Agility

Four months into this new gig at Cribl, I wish I could bottle up that “lightbulb” moment I get when walking people through how Cribl LogStream can help them gain better control of their observability data. So I hope the scenario walkthroughs below will capture some of that magic and shed some light on how LogStream can improve your organization’s data agility – helping you do more with your data, quickly, and with less engineering resources.

A Splunk Approach to Baselines, Statistics and Likelihoods on Big Data

A common challenge that I see when working with customers involves running complex statistics to produce descriptions of the expected behaviour of a value and then using that information to assess the likelihood of a particular event happening. In short: we want something to tell us, "Is this event normal?". Sounds easy right? Well; Sometimes yes, sometimes no. Let's look at how you might answer this question and then dive into some of the issues it poses as things scale-up.

Feature Spotlight: Centralized Log Collection

Speedscale is proud to announce its Centralized Log Collection capability. When diagnosing the source of problems in your API, more information is better. For most engineers, the diagnosis process usually starts with the application logs. Unfortunately, logs are usually either discarded or stored in Observability systems that engineers don’t have direct access to. Compounding this issue is that the log information is typically not correlated to what calls were made against the API.

Prevent Data Downtime with Anomaly Detection

A couple months ago, a Splunk admin told us about a bad experience with data downtime. Every morning, the first thing she would do is check that her company’s data pipelines didn’t break overnight. She would log into her Splunk dashboard and then run an SPL query to get last night’s ingest volume for their main Splunk index. This was to make sure nothing looked out of the ordinary.

Using Oracle Cloud as a Data Lake Made Simple With Cribl LogStream

All Cloud providers such as AWS, Azure, Google Cloud Platform, and Oracle Cloud offer Object Storage solutions to economically store large volumes of data and retrieve it on demand. It’s far cheaper to store one petabyte of data in object storage than in block storage. As AWS S3 has become the standard, many on-premise storage appliance vendors have incorporated S3 APIs to store and retrieve data. Oracle wisely continued that trend to OCI (Oracle Cloud Infrastructure).

The Five Tenets of Observability

A new year is a chance to have a new start, and one thing that it’s a great opportunity to think about is the monitoring and observability platform you’re using for your applications. If you’ve been using a legacy monitoring system, you’ve probably heard about observability all over the ‘net and want to figure out if this is really something you need to care about.

Make the most of your observability data with the Data Volume app

As a DevOps, SecOps, or IT operations manager, you're surrounded by all the technology for the systems running the entire organization. This means legacy infrastructure, multi-cloud environments, services, tools, and applications. All of these components generate data—a huge amount of data—some of which you need to leverage for full-stack observability to ensure those systems supporting the business are running efficiently.

A (de)bug's life: Diagnosing and fixing performance issues in Grafana Loki's read path

Beep, beep, beeeeeeeep. Read path SLO page, again. And I’ve almost found the noisy neighbor! That was me. And will probably be me again at some point in the future. As we continue to scale up the team that builds and runs Grafana Loki at Grafana Labs, I’ve decided to record how I find and diagnose problems in Loki.

Getting Ready for a smooth, speedy migration to the Splunk Cloud Platform

This video shows you how a little bit of preparation before you kick off your cloud migration can lead to a speedy, smooth ride. Additionally, this video will help you decide on your migration strategy that is best for your environment and show you how to assess the efforts required for migrating your environment to the Splunk Cloud Platform.

How Reliability and Product Teams Collaborate at Booking.com

With more than 1.5M room nights booked per day, Booking.com requires a solid infrastructure that’s constantly monitored. And indeed, Booking.com now has a footprint of 50,000+ physical servers running across four data centers and six additional points of presence. The sheer size of this server fleet makes it viable for Booking.com to have dedicated teams specializing into looking only at the reliability of those servers.

What are CDN Logs and Why Do They Matter

Content Delivery Network produces numerous log files called CDN logs to deliver video across the internet to our homes and mobile devices. These logs contain crucial information about the CDN servers' performance and video streaming quality. Also, it contains terabytes of data, which has its own set of hurdles in terms of handling it in real-time and performing analytics to understand user experience and network concerns.

Top 10+ Best System Monitoring Software & Tools [2022 Comparison]

It’s virtually impossible to manage today’s complex IT environments at scale without a comprehensive system monitoring solution that allows you to check the health of all your applications and services from a single pane of glass. When your end users are experiencing difficulties, you must have such a tool in place that lets you quickly ascertain and remediate the root cause of the slowdown or error.

Harnessing AIOps to Improve System Security

You’ve probably seen the term AIOps appear as the subject of an article or talk recently, and there’s a reason. AIOps is merging DevOps principles with Artificial Intelligence, Big Data, and Machine Learning. It provides visibility into performance and system data on a massive scale, automating IT operations through multi-layered platforms while delivering real-time analytics.

LogStream for InfoSec: VPC Flow Logs - Reduce or Enrich? Why Not Both?

In the last few years, many organizations I worked with have significantly increased their cloud footprint. I’ve also seen a large percentage of newly launched companies go with cloud services almost exclusively, limiting their on-premises infrastructure to what cannot be done in the cloud — things like WiFi access points in offices or point of sale (POS) hardware for physical stores.

How to save on your Azure Monitor and Log Analytics Costs

Thomas Stringer has a couple of great blog posts on how to understand your Azure monitoring costs and also on how to reduce your costs, see Azure Monitor Log Analytics too Expensive? Part 2 – Save Some Money | Thomas Stringer (trstringer.com). In the past I’ve blogged on How to calculate the Azure Monitor and Log Analytics costs associated with AVD (not an easy task!).

Kickstart your Splunk App with @Splunk/Create

I’ve been contributing to, and creating, Splunk apps for the better part of the last 10 years. But never have I felt more excited to be a Splunk Developer than right now. One of the primary reasons why I am so excited is because of build tools like @splunk/create. At Splunk, we recognize that developers are so crucial to our entire ecosystem.

Monitoring AWS Spot instances using Sumo Logic

Spot worker nodes on EKS (Elastic Kubernetes Service) are a great way to save costs by allowing customers to take advantage of unused capacity. With Sumo Logic, we have experimented with and adopted spot worker nodes for some of our EKS clusters to see if we can pass along the same benefits. We decided to share some of the learnings, challenges, and caveats with using spot instances along with the monitoring setup.

Monitoring Endpoint Logs for Stronger Security

The massive shift to remote work makes managing endpoint security more critical and challenging. Yes, people were already using their own devices for work. However, the rise in phishing attacks during the COVID pandemic shows that all endpoint devices are at a higher risk than before. Plus, more companies are moving toward zero-trust security models. For a successful implementation, you need to secure your endpoints.

Have You Forgotten About Application-Level Security?

Security is one of the most changeable landscapes in technology at the moment. With innovations, come new threats, and it seems like every week brings news of a major organization succumbing to a cyber attack. We’re seeing innovations like AI-driven threat detection and zero-trust networking continuing to be a huge area of investment. However, security should never be treated as a single plane.

Patterns for better insights and troubleshooting with hybrid cloud logs

Hybrid and multi-cloud environments produce a boundless array of logs including application and server logs, logs related to cloud services, APIs, orchestrators, gateways and just about anything else running in the environment. Due to this high volume, logging systems may become slow and unmanageable when you urgently need them to troubleshoot an issue, and even harder to use them to get insights.

Eight best practices for a successful cloud migration strategy

As a result of the pandemic, we are all navigating an unpredictable mix of virtual, hybrid, and in-person conditions in our business and personal lives. This situation isn’t going away any time soon. The pandemic has prompted businesses across all industries to accelerate their digital transformation initiatives, where the cloud is critical. On-demand self-service environments provide a reason for cloud migration as cloud architectures help businesses reinvent and address uncertainties.

Logit.io Launch New & Improved Alerting Features

We are pleased to announce that we’ve recently launched new and improved alerting features which have been rolled out to users across all of Logit.io’s operating regions. As part of these improvements, we have sought to improve platform usability and have now included a new menu from which users can readily configure a number of popular alert types straight from our pre-configured templates.

Accelerate incident analysis by incorporating Ocean logs in any pipeline

Spot Ocean delivers container-driven autoscaling to continuously monitor and optimize your cloud environment. Positioned at a busy crossroads in the application deployment pipeline, Ocean has a critical role when shipping new containers. Given the highly dynamic nature of Kubernetes environments, events happen constantly and take shape as logs in Ocean. These can help you understand the chain of events in different scaling scenarios, from debugging cluster issues to incident analysis.

How We Implemented a Zero-Error Policy Using Coralogix

With dozens of microservices running on multiple production regions, getting to a point where any error log can be immediately identified and resolved feels like a distant dream. As an observability company, we at Coralogix are pedantic when it comes to any issue in one of our environments. That’s why we are using an internal Coralogix account to monitor our development and production environments.

Collecting Metrics from Windows Kubernetes Nodes in AKS

Windows applications constitute a large portion of the services and applications that run in many organizations. When moving to a Kubernetes-based architecture, there is a need to support these as well. Up until April 2020, the lack of container support within the Windows operating system left Linux container images as the only viable option for Kubernetes container deployment.

Refined User Experience, New Executive Visibility, and Enhanced Cloud Monitoring with Splunk Enterprise Security 7.0

Just like that, another year has gone by full of remote work, virtual conferences, and lengthy Zoom calls. And, although we were not able to see our fellow Splunkers in person at.conf21 that didn’t stop us from previewing the latest enhancements to Splunk Enterprise Security. And now, it gives us great pleasure to announce that Enterprise Security 7.0 is available!

Comparing REST and GraphQL Monitoring Techniques

Maintaining an endpoint, especially a customer-facing one, requires constant monitoring, whether using REST or GraphQL. As the industry has looked for solutions to build a more adaptive endpoint technology, it is also a must to monitor these endpoints. GraphQL and REST are two different technologies that allow user-facing clients to link to databases and platform logic. Both GraphQL and REST include monitoring techniques.

Configure Cribl LogStream to Avoid Data Loss With Persistent Queuing

Preventing data loss for data in motion is a challenge that LogStream Persistent Queues (PQ) can help prevent when the downstream Destination is unreachable. In this blog post, we’ll talk about how to configure and calculate PQ sizing to avoid disruption while the Destination is unreachable for few minutes or a few hours. The example follows a real-world architecture, in which we have.

Splunk Enterprise Logs Now Available in Splunk Observability for Simplified Troubleshooting

We are excited to announce that Splunk Log Observer Connect for Splunk Enterprise, previewed at.conf21, is now generally available! Log Observer Connect is a new feature that lets observability users explore the data already being sent to existing Splunk instances with Splunk Log Observer’s intuitive no-code interface for faster troubleshooting and root-cause analysis.

Graylog Insights -- How 2021 Will Shape 2022

People may not reminisce over 2021, but as Winston Churchill once said, “Those that fail to learn from history are doomed to repeat it.” 2021 swooped in on the coattails of a major supply chain data breach, and a lot of the challenges we experienced during this past year seemed to follow suit. To celebrate the best and hopefully move away from the worst that 2021 had to offer, this look back at 2021 trends can inspire us all to learn, and most of all, show us how to move forward.

Docker Logging: How Do Logs Work With Docker Containers?

Docker containers are a great way to create lightweight, portable, and self-contained application environments. Logging is critical for every application since it gives valuable information for troubleshooting, evaluating performance issues, and drawing an overall picture of the behavior of your architecture. This article presents a thorough tutorial covering all you need to know to start with Docker logging. It also provides some recommended practices for optimizing the logs of your containerized apps.

Five Trends Driving Australia's Data Landscape in 2022

The last couple years have set new global benchmarks for the data and technology sector, putting companies that have considerably accelerated their digitisation processes in the front row. Our relationship with technology also changed immensely. One that has created new expectations by and for all stakeholders – from consumers to enterprise technology companies and governments.

20+ Best Log Management Tools for Monitoring, Analytics & More: Pros & Cons Comparison [2022]

Whether you capture them for application security and compliance, production monitoring, performance monitoring, or troubleshooting, logs contain valuable information about the health of your apps. But it all comes down to what and how you log, which is where log management tools come into play.

Top 15 Website Speed Testing Tools [2022 Comparison]

There are a lot of reasons why people choose to shop at one online store over another or pick one streaming service over another from the type of service they are getting to pricing, quality and, you’ve guessed it from the title, speed. The speed to which I’m referring is the speed at which the website loads and reacts to user input. In one of my previous articles about netwok latency, I’ve talked about how big of a difference even a two-second extra delay makes.

Interview with CTO Kathleen Moriarty

For the newest instalment in our series of interviews asking leading technology specialists about their achievements in their field, we’ve welcomed Kathleen Moriarty, Chief Technology Officer at the Center for Internet Security. During her tenure in the Dell EMC Office of the CTO, Kathleen had the honour of being appointed and serving two terms as the Internet Engineering Task Force (IETF) Security Area Director and as a member of the Internet Engineering Steering Group from March 2014-2018.

Accelerating software delivery through observability at two very different organizations

Delivering value to customers quickly and efficiently is critical to the success of modern businesses. Understanding the process and timeframe during which an organization generates new ideas and then designs, develops and deploys them is vital to success. At the recent Illuminate User Conference, Drew Horn, Director of Business Development, held a discussion with Clara Ko, Director of Engineering at Sauce Labs, and Bryan Veselka, Director and Product Owner for Cloud and Automation at Vizient.

Mapping Statistics - What You Need to Know

When your Elasticsearch cluster ingests data, it needs to understand how the data is constructed. To do this, your Elasticsearch cluster undergoes a process called mapping. Mapping involves defining the type for each field in a given document. For example, a number or a string of text. But how do you know the health of the mapping process? And why do you need to monitor it? This is where mapping statistics come in. Mapping statistics give you an overall view of the mapping process.

Log Management: Your Obvious Choice for Capacity Planning and Optimization

Recently, I wrote an article titled Life Cycle Monitoring: Why an Ounce of Prevention Is Worth a Pound of a Cure. The great Benjamin Franklin coined the term. In the article, I highlighted the value, efficiency, and logic of putting more time into a proper capacity planning and optimization process for all types of IT environments. Most IT professionals would tell you the first thing that comes to their mind when asked how they use log management tools is troubleshooting.

Momma Said Grok You Out: Use LogStream to Streamline Searches, Aid in Reformatting Data and Parsing

It is commonly believed that once data is collected and ingested into a system of analysis, the most difficult part of obtaining the data is complete. However, in many cases, this is just the first step for the infrastructure and security operations teams expected to derive insights.

Learning the tricks of Grafana Loki for distributed logging at scale in a Kubernetes environment

Logging can provide immense detail when used well, or it can become a firehose and take hours to trawl through. The team supporting the Kubernetes platform at Civo needed a solution that was simple and performant and could be queried in ways to help and not hinder them In this talk, Civo SRE Anaïs Urlichs and Principal Engineer Alex Jones will illustrate how Loki was chosen and brought into the organization to empower engineers. Integrating with Prometheus and Grafana dashboards, Loki has allowed engineers to filter for precise information that helps them debug quicker.

Detecting and Preventing Log4J Attacks with Cribl LogStream

Shortly before the December holidays, a vulnerability in the ubiquitous Log4J library arrived like the Grinch, Scrooge, and Krampus rolled into one monstrous bundle of Christmas misery. Log4J maintainers went to work patching the exploit, and security teams scrambled to protect millions of exposed applications before they got owned. At Cribl, we put together multiple resources to help security teams detect and prevent the Log4J vulnerability using LogStream.

Five tricks for logging at scale in a Kubernetes environment with Grafana Loki

Legacy logging solutions simply couldn’t keep up with the complex, hyperconverged regional infrastructure at Civo, a Kubernetes service provider that enables users to launch k8s clusters within 90 seconds. “With our infrastructure and application deployment getting more complex and more distributed, we needed our logging solution and our entire observability stack to scale up with our needs,” said Anaïs Urlichs, Site Reliability Engineer at Civo.

Dr. Changelove: Or How I Learned to Stop Going Vendor-Specific and Love the LogStream

Here at Cribl, we have a cloud offering of our LogStream product. In building and supporting our cloud product, we have a service-based architecture. And we want to be able to gather metrics from our services, in order to monitor those services and make sure we meet our SLAs.

DevOps State of Mind Podcast Episode 6: The Future of DevSecOps with EMA

Chris Steffen is a research director for information security at Enterprise Management Associates. EMA is a leading analyst and consulting firm that prides itself on going beyond the surface to provide deep insights about the IT industry. I'm Liesse from LogDNA. Before we dive in, I just wanted to take a moment to thank all of you for tuning in to season one of DevOps State of Mind.

ELK vs Graylog: Log Management Comparison

As organisations face outages and various security threats, monitoring an entire application platform is critical in order to determine the source of the threat or the location of the outage, as well as to verify events, logs, and traces in order to understand system behaviour at the time and take proactive and corrective actions.