Operations | Monitoring | ITSM | DevOps | Cloud

November 2021

Announcing: Code-level insights with Azure Repos

How many times have you been aware of an error or performance issue, but lacked the ability to quickly uncover the root cause and determine why it happened in the first place? One of the most powerful ways to do this is by surfacing code-level insights right where you need them, alongside the context needed to take action. By using our latest integration with Azure Repos, you’ll be able to get to the ‘why’ of issues, faster – all directly within Raygun.
Sponsored Post

3 IT Workflow Automation Use Cases to Turbocharge Your Business

According to a recent survey by Gartner, business leaders anticipate a return to growth for their enterprises and industries in 2022, and a big part of their investment plans involve digital transformation. In fact, 20% of CEOs cited digital transformation as a priority for strategic investment. That is a significant shift from 2012 when Gartner found that only 2% of CEOs surveyed had made digital transformation a priority.

Observability And AIOps: Why Convergence Is The Future To Improving Uptime

On October 4, Facebook and its properties, Instagram and WhatsApp, were down for more than five hours due to configuration changes on routers in Facebook’s data centers. A five-hour outage is an eternity in our always-on digital economy, costing the company an estimated $65 million and 4.8% in stock valuation. The high-profile Facebook outage is emblematic of just how digitally intermediated our economy is becoming, and the incident renews C-level focus on preventing similar service failures.

Simpler Navigation in AppSignal: A Story About Refactoring Design

AppSignal users will immediately notice that we’ve updated our product navigation. The new navigation is simpler, cleaner, and improves usability for (power) users. Let’s dive into these changes, along with some background on our philosophy of designing for developers.

Kubernetes 1.23 - What's new?

Kubernetes 1.23 is about to be released, and it comes packed with novelties! Where do we begin? This release brings 45 enhancements, on par with the 56 in Kubernetes 1.22 and the 50 in Kubernetes 1.21. Of those 45 enhancements, 11 are graduating to Stable, a whopping 15 are existing features that keep improving, and 19 are completely new. The new features included in this version are generally small, but really welcomed. Like the kubectl events command, support for OpenAPI v3, or gRPC probes.

Webinar: Good monitoring in action - a real life example with allpay

One of the greatest responsibilities of managing your website and IT estate is to stay on top of your uptime, applications and performance metrics. After all, visitors today expect websites to load within a few seconds and provide a seamless experience 24/7. Having an effective monitoring strategy and the right tools in place is essential if you want to retain your customer base and protect your revenue.

3 Reasons Why Microsoft Mesh for Microsoft Teams Makes Performance Monitoring More Important Than Ever

Written by Microsoft MVP Nick Cavalancia. Earlier this month, Microsoft announced the 2022 rollout of Mesh for Microsoft Teams as the next step in online and virtual collaboration. This seemingly bold step forward into a new type of interaction between individuals is more a natural evolution, taking years of augmented reality research and applying it in a way that provides value to organizations wanting to better collaborate.

All-new Netdata Cloud Charts 2.0

Netdata excels in collecting, storing, and organizing metrics in out-of-the-box dashboards for powerful troubleshooting. We are now doubling down on this by transforming data into even more effective visualizations, helping you make the most sense out of all your metrics for increased observability. The new Netdata Charts provide a ton of useful information and we invite you to further explore our new charts from a design and development perspective.

Real User Monitoring: How to Improve Your Target Audience Reach

In the first post of this two-part series, we talked about the need to fully understand how users experience your website. Without understanding how your end users interact with your site’s pages—what’s working for them and what’s not—you’d be optimizing on a hunch without solid data to guide you.

Microsoft Teams on Citrix: Configuration and Deployment

Since early 2020, there has been a massive growth in the number of active Microsoft Teams users and organizations deploying Teams; now, there are more than 200 million monthly active users across the globe. With an increase in market share, it’s one of those applications that you either expect an organization to be already using or planning to deploy out to their environment sooner rather than later.

Cloud cost management and optimization: How to regain control over your bill

The cloud is today one of the most expensive resources for any modern organization, second only to employee salaries and overhead. According to recent research by Gartner, end-user spending on public cloud services will reach $396 billion in 2021 and grow 21.7% to reach $482 billion in 2022. By 2026, Gartner predicts public cloud spending will exceed 45% of all enterprise IT spending, up from less than 17% in 2021.

Driving Unified Visibility within Modern Digital Environments

Operational monitoring can be like looking down the wrong end of a telescope. There’s no clear picture of the horizon. Everything is blurred, indistinct, and difficult to trace. If you’re relying on traditional, domain-centric monitoring, you’re faced with a similar problem: you can see the performance of individual elements, but you don’t have any visibility into the broader picture.

Monitor Salesforce's Real-Time Events with Splunk

In 2019 Salesforce announced the general availability of Real-Time Event Monitoring (RTEM) which includes 19 different events that help monitor & secure your Salesforce data. Real-Time Event Monitoring stores events for 6 months as Salesforce Big Objects and streams events via Salesforce’s Streaming API in near real-time.

The Kids are Connected: Ensuring Connectivity in Education Networks

Connectivity is more important than ever to support our education system. BYOD, remote classes, Wi-Fi, and a variety of digital learning solutions are now just part of the equation. Education IT teams have a big task: making sure educational service delivery remains uninterrupted while balancing the constraints of limited budgets, staff, and time to support digital transformation that can keep up with modern demands.

User experience is a focus of Sumo Logic Observability innovations

Technology environments are rapidly evolving as organizations look to remain competitive, accelerate innovation and make themselves more agile. But in the process, many of the observers, i.e., stakeholders who track infrastructure and application metrics, are falling behind, unable to monitor and manage modern, cloud-native apps and multi-cloud environments due to the complexity that comes with them.

Best Practices to Monitor Node.js Performance

Built on the V8 JavaScript engine of Chrome, Node.js is a very lightweight, open-source framework with minimum modules. And since it is an asynchronous system by default, it is faster than most other frameworks. DevOps still need Node.js monitoring to ensure performance better than other frameworks. In order to understand how relevant Node.js still is, note that PayPal, Reddit, LinkedIn, Amazon, Netflix and other high-use, high-visibility service providers use the framework.

NiCE VMware Management Pack 5.4 released

The NiCE VMware Management Pack 5.4 is an enterprise-ready Microsoft SCOM add-on for advanced VMware vSphere and ESXi monitoring. It supports the VMware administrator in centralized vSphere and ESXi health and performance monitoring to improve user experience and business results. The new NiCE VMware Management Pack 5.4 comes with new features, such as extensive vSAN monitoring options, vCenter Service monitoring, Snapshot management, Datastore Provision monitoring, as well as Certificate tracking.

The Essential Components of Digital Transformation

The digital revolution forced every organization to reinvent itself, or at least rethink how it goes about doing business. Most large companies have invested substantial cash in what is generally labelled “digital transformation.” While those investments are projected to top $6.8 trillion by 2023, they’re often made without seeing clear benefits or ROI.

VMware Management Pack Update Release (21.11.2612.0)

Finally, It's time for a new VMware management pack release, and yes, we heard you! A while ago we asked for your input and feedback in a customer survey. We worked hard to optimize and add new features according to your requests. Some most significant features are mentioned and described in depth below, while you will find all newly added features, changes, and improvements in the release notes.

Key Takeaways from our digital SCOM Usergroups

After almost two years living with a pandemic that changed the way we live and, not the least - the way we work, we are happy to say that we just finished the first events in quite a long while. What used to be very successful "live events" have now turned into online, digital ones, and though it's not quite the same, we are pleased that the participation was high and feedback positive.

What is APM? Overview and Features (The Beginner's Guide)

Application performance monitoring (APM) extends observability beyond system availability, service performance, and response times in current, cloud-native contexts. Organizations can improve user experiences at the scale of modern computing by using automatic and intelligent observability. User experiences in software applications are monitored and managed using APM technologies.

How to build performance tests into your CI pipeline with k6, GitHub Actions, and Grafana

Performance testing is an essential component of building fast and reliable web services. Until recently, this testing typically happened later in the development process and was often performed by a separate team or even a third party. But speed is the competitive advantage for companies, and prioritizing testing during the development process can speed time to market for new applications.

Virtana Optimize | Cloud Cost Management Solution

With Virtana Optimize, you can increase your advantage by optimizing your capacity and cost in real time on an ongoing basis. Our real-time data collection and analytics identify unused resources that can be eliminated so you can stay on budget, even as conditions and options change, and avoid an end-of-month billing surprise.

HTTP/3 is Fast

HTTP/3 is here, and it’s a big deal for web performance. See just how much faster it makes websites! Wait, wait, wait, what happened to HTTP/2? Wasn’t that all the rage only a few short years ago? It sure was, but there were some problems. To address them, there’s a new version of the venerable protocol working its way through the standards track. Ok, but does HTTP/3 actually make things faster? It sure does, and we’ve got the benchmarks to prove it.

Introducing LM Concierge

At LogicMonitor, the success of our customers is our biggest priority. Through product training, guidance, and support, we strive to help you optimize the maximum value out of your LogicMonitor investment. That is why we are delighted to introduce LM Concierge; the most robust, exclusive support and services offering for Enterprise customers to take their LogicMonitor journey to the next level.

Troubleshoot Javascript (in real-time)

Javascript execution analysis on dev environments is easy—just use Google Developer or some other free tools. However, getting the same level of analysis while your application is being used by a real user is much harder. You can’t possibly ask the end-user to help you troubleshoot. Even if you did, the user probably wouldn’t know what to do and they definitely wouldn’t be impressed by your organization.

MSP Remote Monitoring & Management Tools

eG Innovations is an end-to-end performance monitoring solution provider with a dedicated MSP solution and an MSP partner program to allow MSPs to use our functionality to provide value-added premium services. For example, MSPs use our eG Enterprise solution to provide managed first line helpdesk support or to provide dashboards and portals to enable individual clients to monitor their IT infrastructures and applications.

OpenTelemetry Browser Instrumentation

One of the most common questions we get at Honeycomb is “What insights can you get in the browser?” Browser-based code has become orders of magnitude more complex than it used to be. There are many different patterns, and, with the rise of Single Page App frameworks, a lot of the code that is traditionally done in a backend or middle layer is now being pushed up to the browser. Instead, the questions should be: What insights do frontend engineers want?

Cloud Elasticity: What Happens When You Lose Control

In an on-premises environment, you have to pay for the capacity you have regardless of whether you’re using it, and you can’t exceed that capacity without purchasing and provisioning new hardware. In the cloud, however, you have much more flexibility thanks to cloud elasticity, which is the ability to automatically provision or deprovision resources based on workload changes.

Digital Wellbeing and the Overlapping Roles of HR and IT

Who oversees employee digital wellbeing? Nexthink’s Meg Donovan (Chief People Officer) and Tim Flower (Global Director of Business Transformation) recently sat down to answer this question on the minds of so many business leaders. Of course, Human Resources departments have traditionally shouldered the responsibility of managing employee wellbeing. But a recent Nexthink survey reveals that unreliable IT services and equipment is the third biggest contributor to employee turnover and burnout.

Sensu Go 6.5: Sensu Plus, High-performance Handlers, and Pipeline Resources

Sensu Go 6.5 is another feature-packed release and our first integration with the Sumo Logic Continuous Intelligence Platform. In this post, we’ll review the new features that make Sensu Go 6.5 such a banner release, and give you a sneak peek of what we have planned for future Sensu Go releases.

WFA: Visible - Microsoft Office Dashboard

The Microsoft Office Dashboard helps you to analyze usage trends and track adoption across your organization. It also allows you to speed up problem resolution by isolating application performance problems to the user’s device, the network, or the SaaS vendor’s back-end. You’ll be able to measure any performance variations when Microsoft makes app and infrastructure changes too. This provides some much-needed help to reduce finger-pointing! #TeneoGrp

GCP Integrations for Metrics with Logz.io

Logz.io has dedicated itself to encouraging and supporting cloud-native development. That has meant doubling down on support for AWS and Azure, but also increasing our tie-ins with Google Cloud Platform – GCP. Recently, our team added dozens of new integrations for metrics covering the gamut of products in the GCP ecosystem.

Product Growth KPIs Dashboard for Enterprises

There can be as many as 64 important business metrics for your company to track (according to nTask). That can sound daunting. But if your organization doesn’t have the capacity to track all of them, it should at least track the most important ones according to its business model, stage and focus areas. For example, key product metrics not only provide information to product managers, but also other relevant stakeholders across the organization.

Black Friday Requires Dynamic Integration Infrastructure Management

Life for an IT architect in retail would be so much easier if people would just spread their shopping out across the year. Since 1941 American Thanksgiving has been a holiday on the fourth Thursday of November and most American companies and schools take the following day as a holiday too with major retailers offering price reductions on this day to kick start the Christmas shopping season. Since 2005 it has been the busiest shopping day of the year.

How Kambi migrated from an in-house Graphite solution to Grafana Cloud

When you’re a sports betting technology company and you realize your in-house, on-prem Graphite solution for monitoring metrics is no longer a sure-thing, what do you do? That was the dilemma at Kambi, a quickly growing business – with a passion for using open source technology – that has about 500 different micro services in production and around 200,000 incoming metrics messages per second.

Dashbird explained

Dashbird is an observability, debugging, and intelligence platform designed specifically to help serverless developers build, operate, improve, and scale their modern cloud applications on AWS environment fast, securely, and with ease. It’s free to use for up to 1M invocations and doesn’t require any code changes. Dashbird fills the gaps left by CloudWatch and other traditional monitoring tools by offering enhanced out-of-the-box monitoring, operations, and actionable insights tools for architectural improvements, all in one place.

Build VS Buy: How to get the most out of your SCOM monitoring

Microsoft's System Center Operations Manager, SCOM, is one of several monitoring platforms available on the market. It is widely adopted and has great potential to help you create a stable and robust IT environment. To succeed with that, however, is not always straightforward. To fully benefit from SCOM, the right resources need to be in place and optimized- time, competence, and money.

Dashboard Fridays: Example SQL Samples Dashboard

Join SquaredUp's Adam Kinniburgh and fellow virtualisation expert Shawn Williams as they showcase this SQL Sample dashboard. This dashboard was designed to demonstrate how the same base SQL query could be used with each of SquaredUp’s visualizations. Each SQL visualization slightly modifies how the data is presented to target user requirements. Tune in to learn how it was made, the challenges it solves, and Shawn's top tips for building it yourself.

Introduction to Alarm Templates in Foglight 6

With the release of Foglight version 6, alarm management has been simplified with the use of alarm templates. You can access Alarm Templates directly from the left-hand panel in the new UI. However, if you've upgraded from a previous version, it helps to start using the familiar alarm configuration workflow. There is a new tab for "Alarm Template Settings." This allows you to do 2 tasks.

Announcing support for Windows containers on AWS Fargate

AWS Fargate is a serverless compute engine that allows you to deploy containerized applications with services such as Amazon ECS without needing to manage the underlying virtual machines. Deploying with Fargate removes operational overhead and lowers costs by enabling your infrastructure to dynamically scale to meet demand. We are proud to partner with AWS for its launch of support for AWS Fargate on Windows containers.

Top 4 pitfalls causing poor monitoring in an IT estate

As we speak, there are likely thousands of organizations running some type of monitoring solution out there, each within unique operational structures, but the difference between those that that are successful and those that are not can be attributed to 4 root causes. Not surprisingly, these 4 root causes apply to any toolset or technology used. That’s largely due to the likelihood that the best tools in the world used poorly will deliver poor results.

How to improve the customer service experience through status pages

With the 2021 holiday season right around the corner and the COVID-19 pandemic still prevalent, businesses are being conducted online now more than ever. The holiday rush also comes with incidents like websites going down, slow load times, and even possible hacking attempts. While planning to tackle the sudden increase in website traffic during the festive season, businesses must have an incident response plan in place to handle unexpected outages and the consequent surge in customer inquiries.

VirtualMetric's Heatmaps

Within just a couple of seconds, you can see your entire server infrastructure’s status in a visual format in VirtualMetric's Heatmaps. All hardware and performance-related issues will be highlighted in the form of visual cues in neatly organized heat maps. The information is also updated in real-time, thus allowing you to catch errors before they pass the threshold. 👉 Flexible and scalable option VirtualMetric, being cloud-based, can quickly adapt to changing server requirements and infrastructure setups.

k6 introduces browser automation and Prometheus support in k6 OSS

While there is a lot of focus on the three pillars of observability to provide insight into application performance in production, load testing is the other side of the observability story. By using the open source load testing tool k6 — which Grafana Labs acquired earlier this year — developers can simulate real-world traffic to test the reliability and performance of software changes and new features, not to mention flag performance issues before impacting end users.

This Month in Datadog: November 2021 (Episode 6)

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. This month we put the Spotlight on Network Device Monitoring, along with highlighting the many announcements and guest presentations from Dash.

Monitoring Elastic Enterprise Search performance using Elastic APM

Elastic APM is a free, open and powerful observability tool that provides intelligence into application performance for a myriad of production applications (e.g., throughput, error rates, latency, resource usage, transaction traces). You can now enable Elastic APM integration to gain deep insight into the performance behaviors of your Elastic Enterprise Search deployment!

Announcing the General Availability of Splunk RUM Custom Events

With a 70% increase in internet usage, and digital teams adopting cloud-native technologies at a rapid rate, the importance of measuring customer experience on digital properties is not just a technical problem, but a business imperative. Frontend developers and SREs use Real User Monitoring (RUM) to understand critical components of their end-user experience, like how quickly users see content, when a page becomes interactive, and a page's visual stability.

Logging into your Connected Experiences

After checking out the Get Started with Connected Experiences blog, you’ll be an expert on 1) how your Splunk data gets to mobile, (2) how to unlock mobile for your Splunk instance, and (3) how user management works. So, with this blog, I’d say it’s time to talk about the many login methods users can leverage with their Splunk instance on mobile.

Introduction to Time Series Forecasting with Tensorflow and InfluxDB

Wouldn’t it be nice to be able to perfectly predict the future? We are a long way from being able to do that, but that is basically the goal of anybody working in the data science field — take a bunch of historical data and then try to make future predictions based on that data.

Monitoring Distributed Systems

There was a time when standing up a website or application was simple and straightforward and not the complex networks they are today. Web developers or administrators did not have to worry or even consider the complexity of distributed systems of today. The recipe was straightforward. Do you have a database? Check. Do you have a web server? Check. Great, your system was ready to be deployed.

5 Things Developers Need to Know About Kubernetes Management

Kubernetes management can be daunting for developers who don’t have specialized understanding of the orchestration technology. Learning Kubernetes takes practice and time, a precious commodity for devs who are under pressure to deliver new applications. This post provides direction on what you need to know and what you can skip to take advantage of Kubernetes. Let’s start with five things you need to know.

Four powerful Alerting workflows

Since its release last month, Alerting has quickly ingrained itself into the incident response workflow at some of the most technically advanced companies in the world. We’re here to empower your team to do the same. In this blog, we’ll run through four common alerts that you can implement today to ensure you’re maximizing the full potential of Alerting.

Announcing support for Graviton2-powered AWS Fargate deployments

AWS Fargate is a serverless compute engine that allows you to deploy containerized applications on services like Amazon ECS without needing to provision or manage compute resources. Now, Datadog is proud to be a launch partner with Amazon for their support of AWS Fargate workloads running on Graviton2, Amazon’s proprietary ARM64 processor.

How to Keep Traces for Slow and Failed Requests

Today we are introducing Local Tail-Based sampling in Kamon Telemetry! We are going to tell you all about it in a little bit but before that, let’s take a couple minutes to explore what is sampling, how it is used nowadays, and what motivated us to including local tail sampling in Kamon Telemetry.

Datadog vs. Dynatrace vs. Scout

Application Performance Monitoring is undoubtedly the hottest tool to accelerate any product’s growth in the modern market. The term has grown from a simple performance tracking operation to full-fledged infrastructure, network, and application observability. This evolution has only helped bring more and more growth into products. It is essential to choose the best-suited monitoring solution for your apps since observability involves way too many moving pieces.

Monolithic (Legacy) vs. Microservices Application Development

Microservices are becoming increasingly popular and are considered to be the next flexible, scalable, and reliable approach. Without a doubt, many developers are rethinking their application development methods. However, while many have been quick to jump on the microservices bandwagon, it’s not a decision that you should make lightly.

What's new in Sysdig - November 2021

Welcome to a new update of “What’s new in Sysdig.” Happy All Saints’/Souls’ Day! Happy International Pianist Day! Happy Thanksgiving! Happy Diwali! Glad alla helgons dag. The “What’s new in Sysdig” blog has been rotated to a new team, and this month, Peter Andersson is responsible for the publishing. Thanks to Chris Kranz for an excellent job compiling these articles earlier.

Visualizing IoT security metrics with Grafana at Network to Code

As the number of connected gadgets in our homes, offices, and industrial networks continues to grow exponentially, keeping IoT devices secure has become a vital part of our everyday lives. However, our webcams, printers, and smart plugs often lack security features due to their fast time to market, making them particularly vulnerable to attack. And because security metrics themselves can be tricky to assess, tracking IoT device security is increasingly a challenge.

Save 85% in Time Spent on Root Cause Analysis with Topology-based Observability | StackState demo

During the 2021 Gartner IT Infrastructure, Operations and Cloud Strategies conference, our senior solution engineer Mark Arts showed you how you can use topology-based observability - based on our 4T Data Model - to save 85% in time spent on root cause analysis. Useful links.

Time-traveling topology in observability w/ Mark Bakker and Lodewijk Bogaards | The StackPod EP #1

Welcome to this first episode of the StackPod! This is a podcast where we talk about everything related to observability and working in a tech company. For the first episode, we invited our co-founders Lodewijk Bogaards and Mark Bakker. Anthony interviews them about observability and why a time-traveling topology is crucial for that, the move to the cloud and how that affects observability, cloud costs and much more.

We're Thrilled To Share - Coralogix has Received AWS DevOps Competency

At Coralogix, we believe in giving companies the best of the best – that’s what we strive for with everything we do. With that, we are happy to share that Coralogix has received AWS DevOps Competency! Coralogix started working with AWS in 2017, and our partnership has grown immensely in the years since. So, what is our new AWS DevOps Competency status, and what does it mean for you?

Mentorship and Shared Languages of Network Engineering with Cat Gurinsky | Network AF Episode 6

On this episode of Network AF, Avi is joined by Senior Network Engineer Cat Gurinsky to share her journey through networking. Cat found a passion for automating deployments and troubleshooting and is the current chair for the NANOG Program Committee.

What Is an SSIS Package and How Do You Create One?

SQL Server Integration Services (SSIS) is a workflow orchestration service used mostly for data integration and transformation. It was first released as part of SQL Server 2005 and continues to receive updates and new features today. While SSIS can be installed and run on a server (on-premises or in the cloud), it’s also available in Azure Data Factory. SSIS is commonly used to perform tasks such as: While SSIS originally only ran on Windows, it has also been available for Linux since 2017.

LogStream Cloud How To: Sending Data to LogStream from Various Agents

Cribl released LogStream Cloud to the world in the Spring of 2021, making it easier than ever to stand up a functional o11y pipeline. The service is free for up to 1TB per day and can be upgraded to unlock all the features and support with paid plans starting at $0.17 per GB so you pay for only exactly what you use. In this blog post, we’ll go over how to quickly get data flowing into LogStream Cloud from a few common log sources.

File Server Management: How to and Best Practices

True work revolves around the access and modification of files, and what better way to store and distribute work files than through a file server? The central-access model for storing and managing files brings many benefits to an organization. Most importantly, everyone can access a single, accurate version of any file on the server. Easy as that sounds, this feat is only possible with proper file server management.

Cloud Monitoring Best Practices: 5 to Adopt

An exponential increase in the generation of data led to the rise of the Big Data era. Among other factors, the cost of scaling up businesses to accommodate so much data prompted many businesses to switch to virtual cloud platforms. The cloud can store, organize, and manage all the data and applications for a company in a virtual environment. Monitoring this environment is crucial, because it’s susceptible to cyberthreats, like data breaches.

What Is an AIOps Strategy and How Should You Form One?

IT operations data grows by the year. Some estimates suggest that the average IT operations team watches their operational data volume double or triple every year. The result of this flood is that IT teams are grasping for any method they can find to make sense of all this data. Many teams are landing on AIOps as their solution to parse and categorize all of these events. AIOps isn’t a perfect fit for every organization, but it is a great fit for many.

Splunking Your *.conf Files: How to Audit Configuration Changes Like a Boss

For years customers have leveraged the power of Splunk configuration files to customize their environments with flexibility and precision. And for years, we’ve enabled admins to customize things like system settings, deployment configurations, knowledge objects and saved searches to their hearts’ content. Unfortunately a side effect of this was that multiple team members could change underlying.conf files and forget that those changes ever occurred.

Observability vs. monitoring debate: An irreverent view

In the past few years, the word “observability” has steadily gained traction in the discussions around monitoring, DevOps, and, especially, cloud-native computing. However, there is significant confusion about the overlap or difference between observability and monitoring.

DevOps State of Mind Podcast Episode 2: Giving People the Power to Participate

‍Sean Tierney is the DevOps lead at Athos, a company that's building better athletes through smart clothing and AI. Sean reinforces a DevOps state of mind across the organization by building empathy between hardware and software teams and putting the systems in place to allow them to move faster as a single unit.

Real User Monitoring (RUM) vs. Synthetic Monitoring Comparison

Three seconds is all it takes before your customer decides to leave. Would you imagine that! The audacity of some people! But, can you really blame them? We live in a fast-paced world. Wasting people’s time is worse than wasting their money. Developers are striving to provide value in as short of time as possible. Just as I am now writing this tutorial. I’m adamant about not wasting your time but providing you with concrete info for you to learn something new.

How Snyk, TripAdvisor, and Citibank use Grafana to effectively scale observability

It’s one thing to set up an observability strategy. But what’s it like to introduce and scale observability effectively across an organization? In a wide-ranging conversation at ObservabilityCON 2021, three technical pros from Snyk, TripAdvisor, and Citibank joined Grafana Labs VP Global Solutions Engineering Steve Mayzak and — with more than 75 years experience between them — they shared the triumphs and turbulence in their respective observability journeys.

Partner Integration - Dynatrace with PagerDuty and Rundeck

Deliver perfect software experiences with real-time intelligence into customer satisfaction and behavior, your applications, and the performance of your hybrid multi-cloud. AI-powered root-cause analysis automatically identifies customer facing performance issues and pinpoints the root-cause within seconds. Open APIs allow ingestion of 3rd party metrics and enable complex system integrations. In this demo, Rob Jahn shares a sophisticated incident remediation workflow incorporating intelligence from Dynatrace, automation in Rundeck, and incidents in PagerDuty.

How to Make Splunk Run 100x Faster With Cribl LogStream

Enterprises leveraging Splunk for data ingestion and analytics need an observability solution that scales well with their business requirements and provides a cost-effective way to retain data long-term. Cribl LogStream is an essential part of observability, providing a pipeline that works with all tooling, keeps costs down, and scales with any business – making it the perfect complement to Splunk.

5 Best Network Inventory Tools and Software

Keeping your technology organized can be challenging. As you add new devices and phase out old ones, tracking each change can feel insurmountable. However, monitoring these updates is vital for most organizations, making the right network inventory tools indispensable. Finding your best fit might be easier than you think. By understanding your network inventory management options, you can find the most effective software to support your unique IT infrastructure.

The Ultimate Combo: Artificial intelligence and data centers

For artificial intelligence to be devoted to scaring us to death through iconic movies like 2001 or Terminator is a thing of the past, today it has other, much more interesting and practical purposes. For example, crowning itself by playing a fundamental role in data processing and analysis. Yes, that’s her, the futuristic AI, increasingly faster, more efficient and, now, necessary to manage data centers.

Embracing invokedynamic to tame class loaders in Java agents

One of the nicest things about Byte Buddy is that it allows you to write a Java agent without manually having to deal with byte code. To instrument a method, agent authors can simply write the code they want to inject in pure Java. This makes writing Java agents much more accessible and avoids complicated on-boarding requirements.

Is FinOps All Talk and No Walk?

I am a big proponent of cross-functional alignment, as I remnded our ELT at a recent off-site meeting. There’s a lot of buzz about FinOps bringing financial accountability to cloud spend by eliminating procurement siloes and implementing cross-functional best practices. As the CFO of a SaaS company, I fully support this practice. In fact, Virtana recently made some changes to our cloud infrastructure as part of our own evolution.

Sumo Logic extends monitoring for AWS Fargate powered by AWS Graviton2 processors

Back in 2018, AWS first released its Graviton processor—their custom-built 64-bit Arm processor—and followed that with the release of Graviton2 processors just a year later. Now customers running ECS and EKS on EC2 can choose between X86 and ARM64 depending on which processor best fits their application workload.

How to streamline Windows monitoring for better security

If you’re responsible for a significant number of Windows servers, you already understand the importance of being aware of the health and security of your environment. Unfortunately, you’re probably also aware of the tremendous amount of effort and resources required to monitor your Windows environment. Let’s take a look into why and how you should be closely monitoring your Windows server environments from a security perspective.

Galileo Cloud Compass Makes Calculating Cloud Costs EASY

Congratulations, you’ve been tasked with moving your workload to somebody else’s building! The “cloud” is just that: someone else’s data center. But their data center is pretty impressive (so there’s that). And, yes, while there are a million options available to you in AWS and Azure…a plethora of IT goodies, most of us will not use any of those bells and whistles, at least initially. We have to get there first—your basic lift and shift.

Overcoming Common Serverless Challenges in Production

Serverless has been around for a while now and is rapidly maturing day by day. Serverless Guru has been building serverless applications for clients since 2018 and we’ve learned some serious lessons first hand. In this talk, Serverless Guru founder Ryan Jones and AWS Serverless Hero and Lumigo Developer Advocate Yan Cui, will dive into some common challenges in production applications they have seen and how they’ve built solutions to overcome them.

Obfuscate user data with Session Replay default privacy settings

Session Replay enables you to replay in a video-like format how users interact with your website to help you understand behavioral patterns and save time troubleshooting. Visibility into user sessions, however, can risk exposing sensitive data and raise privacy concerns. For example, a user session may include typing in a credit card or social security number into an input field.

Monitor Google Workspace with Datadog

Google Workspace (formerly G Suite) is a collection of cloud-based productivity and collaboration tools developed by Google. Today, millions of teams use Google Workspace (e.g., Gmail, Drive, Hangouts) to streamline their workflows. Monitoring Google Workspace activity is an essential part of security monitoring and audits, especially if these applications have become tightly integrated with your organization’s data.

A 3-step guide to troubleshooting and visualizing Kubernetes with Grafana Cloud

Back in May, we announced the Kubernetes integration to help users easily monitor and alert on core Kubernetes cluster metrics using the Grafana Agent, our lightweight observability data collector optimized for sending metric, log, and trace data to Grafana Cloud. Since then, we’ve made some improvements to help our customers go even further.

Top 7 ManageEngine Competitors and Alternatives in 2021

ManageEngine is Zoho Corporation's enterprise IT management software subsidiary. AdventNet Inc. was founded in 1996 and was known as such until 2009. Over 90 tools are included in ManageEngine to assist you in managing all areas of your IT infrastructure, including networks, applications, servers, service desks, security, active directory, desktops, and mobile devices. They've also built products with contextual integration from the ground up to ensure that you can manage your IT together.

Tip of the Month: Experience Journey Map

When it comes to business-critical functions of your applications, sometimes the best indicator of performance is in the step-by-step user journey. Watch this ‘Tip of the Month’ video to leverage experience journey mapping in AppDynamics, and learn to interpret expected user behavior and deviations that might indicate performance issues in the most important areas of your application.

Understanding Apache Logging: How to View, Locate and Analyze Access & Error Logs

Apache – the technology that powers the web. I’m not sure if that is correct, but I think that we wouldn’t see the world wide web in its current shape without it. Launched in 1995 and since April 1996, it has been the most popular web server around the world. Because of handling your users’ requests, Apache serves as the front application. It is crucial to understand what your server is doing, what file users are accessing, from where they came, and much, much more.

Using HTTP Caching: 2022 Guide

The fastest website is the website that is already loaded, and that’s exactly what we can do with HTTP caching. HTTP caching lets web browsers reuse of previously loaded resources, like pages, images, JavaScript, and CSS. It’s a powerful tool to improve your web performance, but misconfiguration can cause big performance problems. Here’s what you need to know to use HTTP caching without reading hundreds of pages of HTTP Caching Spec.

Top 10 AWS Services Explained With Use Cases

Amazon Web Services (AWS) is one of the most comprehensive and broadly adopted cloud service providers in the industry, offering over 200 fully featured services from data centers globally. A large spectrum of clients across verticals uses AWS to lower costs, become more agile and innovate faster. A recent survey estimates that AWS is the largest cloud service provider and accounts for 32% of the worldwide cloud services market.

AWS Lambda Pricing Model Explained With Examples

In this article we’ll go through the ins and outs of AWS Lambda pricing model, how it works, what additional charges you might be looking at and what’s in the fine print. Money makes the wold go round. Unfortunately, it is a necessity in almost all spheres of life. You can live without it or with lesser amounts of it, but it makes it all harder. If you wish to have it, first, you need to give it, as always. Even AWS Lambda is not free.

Getting Started with Go and InfluxDB

Conventional databases such as PostgreSQL or MongoDB are great at safekeeping the state of your system in a tabular or document format, but what about time-dependent data: systems metrics, IoT device measurement or application state change? For those things, you need a more suitable type of database, one designed to manage better semi-structured data with a time characteristic.

7 Things to Check Before Launching a Website

Launching a website is a process that takes numerous essential steps. As the launch date comes closer, the excitement might lead to an oversight here and there. You can prevent this by deploying patience and double-checking every part of your website before it goes live. It all comes down to the type of website you're planning to launch. The more features you're planning to have, the longer the pre-launch inspection is going to take.

NECA delivers 99.9% system uptime using Applications Manager

NECA is an allied association of local telecommunications providers in the United States that was instituted by the Federal Communications Commission. It aims to provide rural consumers across the country with telecommunications and broadband services at affordable prices. Its services also include economic forecasting, trend analysis, industry database management, and rate and tariff analysis.

Monitor your Synthetic private locations with Datadog

Datadog Synthetic private locations play a key role in your organization’s test infrastructure by serving as highly customizable points of presence (e.g., data centers, geographic locations) for running synthetic tests on internal services. You can deploy private locations using the orchestrator of your choice, enabling you to seamlessly roll them out and scale them with the rest of your container fleet.

Monitor Confluent Cloud with Datadog

Confluent Cloud is a fully managed, cloud-hosted streaming data service. Enterprise customers use Confluent Cloud for real-time event streaming within cloud-scale applications. We’re excited to announce a new integration between Datadog and Confluent Cloud, which enables users to get deep visibility into their Confluent Cloud environment with just a few clicks. In this post, we’ll introduce how to set up the integration and start monitoring key metrics from your clusters.

Log Management Guide: Why You Should Track Logs?

IT experts agree that log management and monitoring is one of the most effective ways to keep IT infrastructure performing optimally. Logs play a vital role in improving performance, enhancing security, and detecting issues. But at the same time, a lot of people don’t use logs to the best of their ability. This guide will not only introduce you to log management but also reveal which logs to track and what information they are giving to you.

Windows Event Log Best Practices for Operations Teams

The Windows Event log is an essential tool for administrators to investigate and diagnose potential system issues, but it can also be a daunting task to gain real value and separate useful log entries from noisy non-essential activity. Depending on the level of logging that can be useful, Windows events can span system issues, application-specific issues, and also dive into security type issues around unauthorized access, login failures, and unusual behavior.

Development Environment Observability with Sentry

At Sentry, we’re always looking for innovative ways to dogfood our product. Over the last year we added Sentry’s error monitoring to our developer environment so that we could better understand the health of it. In this blog post I’m going to touch on how fragile local development environments can be, how we brought observability into what’s happening by introducing Sentry, and what outcomes it has driven for our engineering organization.

Challenges maintaining Prometheus LTS

In this article, we’ll cover the three main challenges you may face when maintaining your own Prometheus LTS solution. In the beginning, Prometheus claimed that it wasn’t a long-term metrics storage, the expected outcome was that somebody would eventually create that long-term storage (LTS) for Prometheus metrics. Currently, there are several open-source projects to provide long-term storage (Prometheus LTS). These community projects are ahead of the rest: Cortex, Thanos, and M3.

Introducing Logz.io Event Management: Accelerating Collaborative Threat Response

In the domain of cyber threat response, there’s a critical resource that every organization is desperately seeking to maximize: time. It’s not like today’s DevOps teams aren’t already ruthlessly focused on optimizing their work to unlock the greater potential of their human talent. Ensuring your organization to identify and address production issues faster – and increase focus on innovation – is the primary reason why Logz.io and its observability platform exist.

How to Measure Packet Loss | Obkio

Packet loss is one of the core network metrics that you should be measuring when monitoring your network performance. The most accurate way to measure packet loss is by using a Network Monitoring Software, like Obkio. This frequency is essential because packet loss is based on a percentage. For that percentage to be accurate, you need to monitor continuous volume. Easily see the percentage of packet loss anywhere in your network with updates every 500ms.

Dashboard Fridays: Sample Symantec Endpoint Protection (SEP) Dashboard

Join SquaredUp's Adam Kinniburgh and SCOM community hero Ruben Zimmermann as they showcase this example SEP Dashboard. Giving an overview of the status of the various endpoint protection systems, this dashboard is used by the IT team to keep on top of device security, and by the service desk to escalate appropriately.

Video: The new simple, scalable deployment for Grafana Loki and Grafana Enterprise Logs

With the recent release of Loki 2.4 and Grafana Enterprise Logs 1.2, we’re excited to introduce a new deployment architecture. Previously, if you wanted to scale a Loki installation, your options were: 1) run multiple instances of a single binary (not recommended!), or 2) run Loki as microservices. The first option was easy, but it led to brittle environments where a heavy query load could take down data ingestion and problems were often difficult to debug.

Simple, scalable deployment for Grafana Loki and Grafana Enterprise Logs

Loki 2.4 and GEL 1.2 introduced a hybrid deployment model that takes the simplicity of running the Loki log aggregation solution as a single binary and introduces an easy path to high availability and scalability. Particularly for organizations running on virtual machine and bare metal (non-Kubernetes) environments, this is a game-changer! Learn more in this tutorial from Grafana Labs Senior Software Engineer Trevor Whitney.

Streamline Issue Management and Communication at Scale: Power Home Remodeling and Sentry

When it comes to managing multiple applications and services, driving alignment and communication across teams can be like herding cats. Too many channels, projects, and cross-functional stakeholders can cause friction that slow down issue management and affect the overall product experience.

Maintaining Operational Sanity Across 100+ AWS Accounts | Eric Mann / Ryan Tomac (Vacasa)

At Vacasa, AWS accounts represent the unit of isolation for distinct applications & services in our software ecosystem, providing security benefits and operational autonomy for our teams as we scale. Managing accounts at this scale requires strong DevOps practices to maintain security, operational sanity, and uniform observability across the system. In this talk, we’ll cover the benefits of such an approach, the practices that make it possible, and the important role Datadog plays.

What is AIOps?

AIOps is an approach to managing the exponential growth of IT operations and the complexity of new technology through the application of artificial intelligence (AI). IT infrastructure increasingly relies on complicated deployments, multi-cloud architectures, and huge amounts of data. Traditionally, the tech industry responds to complexity by applying extra brainpower to the problem, bringing in more engineers, developers, and management.

Istio Log Analysis Guide

Istio has quickly become a cornerstone of most Kubernetes clusters. As your container orchestration platform scales, Istio embeds functionality into the fabric of your cluster that makes monitoring, observability, and flexibility much more straightforward. However, it leaves us with our next question – how do we monitor Istio? This Istio log analysis guide will help you get to the bottom of what your Istio platform is doing.

Detailed Insight, Right on Time: Introducing Scheduled Alerts

Logz.io customers, here’s some big product news that we think you’ll be excited to hear. Scheduled Alerts, an altogether new manner of alerting, is coming your way. That’s right, get ready to utilize a whole new world of alerts that weren’t previously available in the Logz.io platform.

How to Restore Databases From Native SQL Server Backups

In my previous post, Native SQL Server Backup Types and How-To Guide, I discussed the main types of native SQL Server backups and various backup options. Backups are critical to restoring databases quickly, but there isn’t much benefit to having backup files sitting around if you aren’t prepared and know when and how to perform the restores.

Reverse Connect for Azure Virtual Desktops (AVD)

There’s something common between AVD and eG Enterprise. Can you take a wild guess? Listening on open TCP ports is an extremely bad practice for cloud architectures, as it exposes products and services to accepting incoming messages from malicious parties. This is something eG Innovations avoids in our own products (see details). This is also a best practice adopted by Microsoft for Azure Virtual Desktops (AVD).

Incident Review - Google Cloud Outage has Widespread Downstream Impact

Outages on the Internet always catch you by surprise, whether you are the end user or the Head of SRE or DevOps trying to keep a clear mind while you execute your incident playbook. As people in charge of ensuring reliable services for our customers, our normal experience of outages involves surfing a deluge of fire alarms and video calls as we work to solve the problem as quickly as we can. We often forget, therefore, what an outage means to the end user.

Ask Miss O11y: Mapping Out Your Observability Journey

Dear Trapped, Thanks for asking the question! Approaching observability as an all-or-nothing problem often leads to the project feeling daunting. But that’s not specific to observability—any project can be overwhelming if you think it needs to be done all at once, perfectly. Such as, erm, writing an entire book on observability! *looks around worriedly*

Tutorial: Build Serverless functions with C#

The world of cloud computing has been revolutionized by a solution called serverless computing. It has been an absolute joy for developers to use. Before this innovation, developers had to worry about the resources powering their code. Since the launch of serverless computing, the developer’s focus on operating-system and hardware architecture is now a thing of the past. It handles all the server management while focusing on what you do well — writing good quality code.

How Not to Break Your Network With Updates

“If it ain’t broke, don’t fix it” is a popular quote for a simple reason: Changes can lead to unexpected results. Many network engineers have learned this lesson the hard way. Stories of admins who update firmware or configurations only to have network problems begin are common. Even big names in networking have been hit by change-induced network failures.

Extend your DevOps analysis to CircleCI and GitLab data

Every company is a software company and every company wants to get better at it. That’s the reason we built Software Development Optimization or SDO. SDO helps you track siloed data across the DevOps toolchain. It normalizes and correlates data, provides you with DORA’s 4 key metrics and gives you deep insights into the velocity and quality of delivery across services and teams.

Managing the Mess of Modern IT: Log Analytics and Operations Engineering

IT is messy stuff. Enterprise applications and devices rely on a web of interdependent clouds, networks, and containers. IT operations (ITOps), development operations (DevOps), and cloud operations (CloudOps) engineers work hard to manage this mess. If they succeed, they create a stable, agile IT environment that makes their enterprise more productive. If they fail, their enterprise becomes less productive.

Notebook Sharing

It’s that season of sharing, and in the spirit of sharing, we have a new feature to share with you — notebook sharing. Now you can take your favorite InfluxDB notebooks and share them with whoever you would like. They don’t need to have an InfluxDB Cloud account. They just click on the link you share with them, and they can see the notebook that you shared, for the time range that you selected.

WFA: Visible - Digital Experience Index (DXI) Dashboard

If you’re not sure where to start when prioritizing investments that will have the greatest impact on user experience, this dashboard is for you! Teneo’s WFA: Visible DXI Dashboard allows you to be 5 times faster at prioritizing objectives, identifying what to improve, and reporting on improvement. #TeneoGrp

Top 5 ways OpManager helps streamline network performance by monitoring Windows processes

Forty percent of enterprises say an hourly network outage costs up to $1 million. Evidently, one minor overlook in monitoring the performance and availability of a network can result in costly downtime. This is why firms that heavily rely on network operations should perform dynamic network monitoring to keep their network protected from unexpected downtime.

Monitor Mobile Vitals with Datadog

After you release an Android application, you need to ensure a smooth, engaging experience for users. Poor performance and heavy resource consumption can cause your application to rank lower for prospective users in the Google Play Store, and existing users can become frustrated and even uninstall your application. All of this can spell trouble for business-related performance indicators like engagement and discoverability.

Featured Post

Black Friday: How Retailers Can Create an Optimal Online Experience

For about a year and a half now, traditional window shopping has been replaced in many places by online shopping sprees. Particularly as the coronavirus pandemic began, general shopping behavior has shifted toward e-commerce. And although most stores have now returned from lockdown to open their store doors, there is no denying the online shopping industry is still thriving. For the second year in a row, the holiday shopping season is also directly affected by this trend.

Enterprises Just Got an Integration Infrastructure Management (i2M) Upgrade, Intelligence from Their Integration

November 2021 is a good month if you’re a Fortune 500 or Global 2000 enterprise. The investments your organization has made in “integration” over the years were necessary as the organization and the IT infrastructure grew, but the Integration Infrastructure (i2) has likely been considered a necessary evil by senior management. That investment can now be leveraged in two important, new ways.

5 lessons from the October 2021 Facebook outage

On October 4, 2021, Facebook services went off the grid gradually, and then suddenly at 15:39 UTC. It took nearly six hours to restore service to normal. With over 3.5 billion users facing a lengthy downtime using one or multiple products from Facebook, Inc. (now known as Meta Platforms, Inc.) conversations flooded the internet about what caused the downtime issues on the American social networking service.

Logz.io Anomaly Detection: Shedding Light on "Unknown Unknowns"

Moving beyond traditional monitoring to embrace full stack observability offers a seemingly endless range of benefits. Beyond unifying logs, metrics, and traces in a single platform, the opportunity to enlist advanced analytics and engage a more predictive approach represents another huge step forward.

Best Practices for Cloud Monitoring

In our last episode, we covered best practices for deploying and using Cloud Operations in an enterprise environment. But we still left some questions unanswered. How should you monitor your services? How should you deal with alerts? And what about managing cost? In this episode of Engineering for Reliability, Yuri discusses best practices for setting up and using Cloud Monitoring and optimizing monitoring costs.

Introducing the AWS CloudWatch integration, Grafana Cloud's first fully managed integration

At Grafana Labs, we are continuing to build integrations that make it easier than ever to observe your systems, no matter which tools or software you choose. Today, we’re excited to talk about the latest integration available in Grafana Cloud: the AWS CloudWatch metrics integration, the first of our fully managed integrations that makes it simple to connect and visualize your data in Grafana.

AMA Responses: Icinga for Windows

00:10 Will there be any further Windows development in Icinga 2 except for the Windows agent part?

01:10 Are the Windows plugins considered to be deprecated?

02:12 Is it possible to only have the Icinga agent and the plugins without having the whole Icinga for Windows framework?

03:38 Are there plans to provide the PowerShell plugins as standalone, so one can use the plugins without the framework stuff?

Announcing Service Performance Monitoring in Early Access

Today, we’re thrilled to announce the early access of our Service Performance Monitoring capability. As today’s DevOps teams know all too well, monitoring application requests in modern microservices architectures is extremely difficult. Requests typically travel across a vast ecosystem of microservices and, as a result, it is often a significant challenge to pinpoint a specific failure in one of these underlying services.

Papertrail and Heroku

SolarWinds® Papertrail™ has supported Heroku almost from the beginning, as an add-on in the Heroku Marketplace and as a compatible standalone log management tool. Heroku’s focus on empowering developers to build and deliver applications by providing an easy-to-use platform as a service fits perfectly with the vision of Papertrail. Both developer-focused technologies can be set up typically in minutes, are easy to use, and offer powerful functions.

User-scoped API Keys

Checkly has released a change to the way API keys are created and managed. In the past, API keys were account-scoped. These account-scoped keys have full access rights to your Checkly account and no accountability to which user is using the key. When we originally built Checkly, we made it a tool to enable individual developers to quickly and easily set up browser and API checks. We help ensure your web applications are up and running and send alerts when something goes wrong.

5 Criteria You Need to Drive Efficient Alarm Management

As a commercial pilot landing at night on an unfamiliar runway, the last thing you want is a cockpit alarm telling you the passenger in 14A wants more ice in their soda. You need to concentrate on the job at hand. At that critical moment in flight, you only want visibility into the alarms that matter. It’s the same with your monitoring environment. Too often, you can be overwhelmed by a tsunami of alarms—thousands of monitoring alerts that all point to the same problem.

Get Started with Connected Experiences

November, the season of post-conf, is upon us. Hopefully all you Splunk admins and sc_admins are craving the release of a ton of new.conf21 Splunk features. Well, good news, because Connected Experiences is here to help you get started with everything Splunk Mobile, Augmented Reality, TV and iPad with this one handy guide. Let’s get started!

Splunk Developer Fall 2021 Update

While it’s cooling down here in California as Fall arrives, we have some really hot and exciting updates from.conf21, including the announcement of Splunk Cloud Developer Edition, the new Splunkbase user experience, detailed guidelines to help you deliver cloud-ready apps for Splunk Cloud Platform, AppInspect updates with new checks, a helpful blog about storing app secrets, updated docs for Modular Inputs and External Lookups, a summary of SDK updates, and more.

23 Minutes Lost: The Real Consequences of Digital Workplace Distractions

We’ve all been there: you’re dialed-in to a specific task, hyper-focused on completing it… and then some minor distraction pulls you away. An email notification chimes, a coworker asks you a simple question, an angry driver wails on his horn outside your window – and when you return to the task at hand, you realize that tunnel-vision focus you just had is now lost.

Monitoring Your Smart Home with InfluxDB and IFTTT

Do you have a bunch of smart home devices, such as IoT devices like smart switches, cameras, doorbells, alarm systems or appliances? Have you ever wanted to monitor and send events from those devices to InfluxDB? And wouldn’t it be amazing if you could do that with zero coding? With IFTTT Webhooks, you can! Let’s dive in.

Why you should have a website monitoring tool ready for Black Friday

It’s that time of year again – no, not Christmas, but the hugely anticipated Black Friday. When discounts hit bigger numbers than the lottery, and customers get into a bargain-hunting frenzy. But it’s not all fun and games as a company owner during the biggest sales season of the year; unfortunately, you’re more likely to suffer website issues than on an average day.

Understanding business and security risk

Even if an organization has developed a governance team, aligning integration decisions with business needs must be incorporated into the zero trust architecture. The company’s business model drives the applications chosen. The senior leadership team needs someone who can translate technology risks and apply them to business risks. For example, security might be an organization’s differentiator.

Percepio Brings Big Debugging Boost for Microsoft Azure RTOS ThreadX Developers

Percepio®, a leader in visual trace diagnostics for embedded systems and the Internet of Things (IoT), today announced improved support for Microsoft Azure and Azure RTOS ThreadX in Tracealyzer, two enhancements that will ease the development and debugging of Azure IoT systems.

A network admin's guide to network diagram software

For organizations growing larger by the day, network management becomes increasingly complex, and scaling to meet this growth can be a major headache. To battle such complexity without a graphic representation of a network is a tiresome task, which is where network diagram software comes in. Network diagram software allows a network admin to portray the network clearly and legibly through detailed graphics.

Deploying your first Serverless REST API within minutes

In this video, we'll show you how to get started with the serverless framework in minutes. To make sure your app will be running smoothly at all times, we'll also take you through how to set up observability for debugging, alerting, and troubleshooting so that you don't miss any critical errors and warning signs.

Monitoring Without Limits - Circonus' Series B led by Baird Capital

This is my fifth company as CEO and I’ve really come to enjoy working in B2B enterprise software – the proverbial “picks and shovels” end of our industry. In particular I enjoy innovating in spaces that may not be immediately obvious and typically overlooked in the mad gold rush of today’s frothy venture capital scene.

Why we created a Prometheus Agent mode from the Grafana Agent

On 2021-10-29, initial support for Prometheus Agent was merged, and it is slated for inclusion in Prometheus v2.32! This feature has a bit of a lengthy history to it: It took a little while to get to where we are today, but I’m thrilled that we were able to use the Grafana Agent code to enable agent-like functionality in the prometheus/prometheus repository.

Windows logevent and regexp modules

There are several predefined modules in Pandora FMS with which we can monitor logs of our Windows environments in a simple way. This time we are going to see how to use the logevent and regexp modules through the log collector. Remember that it is necessary to have an ElasticSearch server installed in order to use these modules from the agent plugins section.

Measure, Track, Improve: Streamlined event exploration and increased visibility into team health

For many engineering leaders, measuring their team’s impact can be hard to quantify and a face:palm process, filled with searching through logs and exporting data sets to cobble together a report that most people won’t even look at twice. And let’s be honest, if you wanted to spend time making reports, you wouldn’t have become a developer.

DoD-Worthy Interoperability & Cybersecurity Standards: What Does It Mean to Our Customers?

This is the third in a series of four ScienceLogic blogs on the topic of the Department of Defense Information Network (DoDIN), including what it is, what it means to be approved under DoDIN standards, why it is important to both our federal and private industry customers, and the process for being approved for listing.

Announcing Logz.io Unified Dashboards

In today’s cloud environments, a typical observability stack might include an Elasticsearch cluster for logging, a few Prometheus servers for metrics monitoring, and an AppDynamics deployment for APM. You may run something similar – most observability stacks consist of multiple siloed tools dedicated to collecting and analyzing specific types of monitoring data.

How to Ensure Microsoft Teams Performance when Employees are Returning to their Workplace - Use Case

Martello Vantage DX helps you prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to your business lines and customers.

Let the Orion Platform Do the Heavy Lifting | Orion Platform Automation Made Easy: Session 1

Whether you know it or not, the Orion® Platform has some easy-to-use automation built right in. Automatically adding discovered devices, assigning custom properties, taking the toil out of building groups, and a plethora of other ways you can save precious time and reduce errors.

ScaleUP 2021: Taking the Logz.io Observability Platform to the Next Level

Today was a very exciting day for Logz.io, as we held ScaleUP 2021 – our second annual user conference – dedicated to elevating our customers’ success, discussing best practices for modern observability, and unveiling Logz.io’s latest product updates. These product advancements were presented by our Co-Founder and VP of Product Asaf Yigal, and members of the Logz.io software engineering team.

4 Best Tools to Measure and Reduce Network Latency

If your network operations are slow and inefficient, you could be having issues with your latency. Latency measures how long it takes for data to move from one place to another. There are many reasons why you could be experiencing network latency—propagation time, transmission delays, and processing delays are all common causes of latency. When latency occurs, it negatively affects the performance of your network.

Risky Business: Implementing a Redundant Networking and Multi-CDN Monitoring Strategy

Last month, we partnered with AWS to put together a webinar on the importance of implementing a comprehensive redundant networking and multi-CDN monitoring strategy. You can replay the event in full here. In this article, we’ll recap the key takeaways covered by the panel of experts who included Leo Vasiliou, Director of Product Marketing at Catchpoint, and Steve Campbell, our Chief Strategy Officer.

Remote Server Management Guide: What Is It and How It Works

Remote server management is a proven strategy used for increasing the uptime and responsiveness of your IT infrastructure. It manages the performance, health, and utilization of remote servers or back-end systems on various networks. After reading this post, you’ll understand what remote server management is, how it works, and how to implement it.

DevOps State of Mind Podcast Episode 1: Trust, tooling, and a no-blame culture with LogDNA

Tucker Callaway is the CEO of LogDNA. He has more than 20 years of experience in enterprise software with an emphasis on developer and DevOps tools. Tucker fosters a DevOps culture at LogDNA by tying technical projects to business outcomes, practicing extreme transparency, and empowering every person in the company to contribute.

Generate span-based metrics to track historical trends in application performance

Tracing has become essential for monitoring today’s increasingly distributed architectures. But complex production applications produce an extremely high volume of traces, which are prohibitively expensive to store and nearly impossible to sift through in time-sensitive situations. Most traditional tracing solutions address these operational challenges by making sampling decisions before a request even begins its path through your system (i.e., head-based sampling).

Get planet-scale monitoring with Managed Service for Prometheus

Prometheus, the de facto standard for Kubernetes monitoring, works well for many basic deployments, but managing Prometheus infrastructure can become challenging at scale. As Kubernetes deployments continue to play a bigger role in enterprise IT, scaling Prometheus for a large number of metrics across a global footprint has become a pressing need for many organizations.

Logz.io Moves to Embrace OpenSearch at the Core of its Platform

As Logz.io prepares to hold its annual ScaleUP user conference tomorrow, celebrating another amazing year of customer success and continued advancement of our observability platform, we’ve got exciting news to share about our involvement with the OpenSearch project.

Real Time eCommerce Analytics: The Only Solution for the Holiday Season

Digital trade and eCommerce companies are generating transactions in more significant quantities than ever before. In 2020, eCommerce sales made up 19% of all worldwide retail transactions, representing $26.7 trillion in revenue. The cornerstone of any eCommerce company is providing a seamless, reliable experience where customers can log into a clean interface, browse products, and make purchases quickly and on-demand. Increased digitization after the pandemic has only heightened the stakes.

How to Measure and Improve Your Serverless Application's Health

This article will cover how the health of your serverless application can be measured and improved. Technology and its implementation methodology evolve with time very rapidly. Cost efficiency and productivity are the key drivers of technological evolution these days. With the advent of the cloud, infrastructure costs have been brought down significantly. Serverless technology adds icing to the cake!

11 Reasons Why You Should Migrate to Next-Generation DX APM

The current version of DX APM continues a long history of innovation for APM technology. More than two decades ago, the solution was the pioneer in byte code instrumentation. DX APM is now a next-generation solution for today’s complex and hybrid enterprise environments. Figure 1: Broadcom’s DX APM has evolved from Wily Technology’s invention of byte code instrumentation-based APM, which was introduced in 1998.

Introducing DX Unified Infrastructure Management

Today, we are launching the newest feature of Broadcom’s Enterprise Software Academy: DX Unified Infrastructure Management (DX UIM) resource pages. DX UIM is redefining infrastructure management with full-stack observability, an open architecture, modern admin and operator consoles, and zero-touch configuration.

Do it Yourself: Generic REST API-Based Monitoring

DX Unified Infrastructure Management (DX UIM) has more than 150 monitoring probes, which enables IT administrators to monitor everything from traditional mainframe servers to modern hybrid clouds running on a wide range of platforms and operating systems. Traditionally, a separate probe has been required to monitor each specific technology. That’s because the interface that retrieved monitoring metrics was either proprietary or technology specific.

Monitoring Cohesity with DX Unified Infrastructure Management

The Cohesity Data Platform consolidates backups, file shares, object stores, and data on a single web-scale data management platform. As a result, this platform helps reduce data sprawl and mass data fragmentation. The Cohesity platform allows teams to make their backup and unstructured data more productive for a range of efforts, including rapid app development, compliance, security, and analytics.

Full-Stack Infrastructure Observability with DX Unified Infrastructure Management

The IT infrastructure landscape has seen tremendous changes over last few years due to evolving technologies, newer business models, and ever-changing market demands. Business, market, and consumer demands are pushing such IT advancements as cloud, mobility, and IoT.

How Using Annotations with OpenTelemetry Can Lower Your MTTR

When it comes to gaining control over complex distributed systems, there are many indicators of performance that we must understand. One of the secrets to understanding complicated systems is the use of additional cardinality within our metrics, which provides further information about our distributed systems’ overall health and performance. Developers rely on the telemetry captured from these distributed workloads to determine what really went wrong and place it in context.

The Importance of Application Performance Monitoring.

In this digital era, applications are used on a daily basis to make our life easier. A lot of software applications are launched every day, continuously increasing competition in the market. There are thousands of applications available for a task. But, the success of any application highly depends on its performance. A good-performing application provides a flawless user experience to the customers.

Digitalization Challenge the future with Pandora FMS

Pandora FMS unifies information on all systems, from all departments and environments in a single console, coordinating efforts and presenting useful information to the appropriate person, at the right time. Your IT infrastructure shouldn't feel out of your control when it relies on external provider services. We don't just provide a product, we integrate a solution and engage with our customers from start to finish by executing successful projects.

Dashboard Fridays: Sample UptimeRobot Dashboard

Join Adam Kinniburgh and Ashley Thompson as they showcase this example UptimeRobot Dashboard. Built in SquaredUp with the powerful WebAPI tile, this dashboard gives an overview of monitoring configured using UptimeRobot’s website monitoring features. Query UptimeRobot for information about your web tests, summarize them, and create a slick UptimeRobot dashboard in SquaredUp. This is an example of querying the web test data, though the other data sets are equally easy to pull in!

Monitoring & Observability for Sales, Marketing and Business ops teams with StackMoxie and PagerDuty

Before Stack Moxie, every business ops team needed PagerDuty, but finding and pushing errors was a manual process. With Stack Moxie + PagerDuty, every business op professional can manage their sales, marketing, HR or customer success stack with the same quality engineers bring to code.

How to set up Stanza as the log agent for your GCP?

Stanza is a robust log agent. GCP users can use Stanza for ingesting large volumes of log data. Before we dive into the configuration steps, here’s a matrix detailing the functional differences between all the common log agents used by GCP users. Stanza was built as a modernized version of FluentD, Fluentbit, and Logstash. GCP users now have the ability to install Stanza to their VMs/ GKE clusters to ingest logs and route them to GCP log explorer.

Implementing SLAs, SLIs, and SLOs in an observability suite

Implementing SLAs, SLIs, and SLOs in an observability suite is now business-critical. Over time, a company’s decision-makers can add a burdensome number of KPIs that force servers and other IT assets to devote excessive processing time to business intelligence. Eventually, the burden becomes so great that employees, managers, and executives start to complain about the system’s sluggishness. Developers know that they need to strike a balance between business needs and IT processes.

Ask Miss O11y: I Don't Want to Be On Call Anymore. Am I a Monster?

First, I’d like to say that pager duty isn’t something we should treat like chronic pain or diabetes, where you just constantly manage symptoms and tend to flare-ups day and night. Being paged out of hours is as serious as a fucking heart attack. It should be RARE and taken SERIOUSLY. Resources should be mustered, product cycles should be reassigned, until the problem is fixed.

How to Monitor Your Internet Speed with Telegraf & InfluxDB Cloud

Complaining about your crappy internet speed is a tale as old as time. Given the rapid shift for so many of us to work from home, our internet speed now affects us on a daily basis. Where in my house should I avoid taking Zoom meetings because of low download speed? Does my internet speed actually get worse in the evenings, or am I just paranoid? How far away from the microwave do I really need to be to ensure that my wifi isn’t impacted?

Run UDP and WebSocket API tests to monitor latency-critical applications

Datadog Synthetic Monitoring allows you to proactively monitor your applications so that you can detect, troubleshoot, and resolve any availability or performance issues before they impact your end users. With our API test suite, you can send simulated HTTP requests to your API endpoints, check the validity of SSL certificates, verify the performance and correctness of DNS resolutions, test TCP connections, and ping endpoints to detect server connectivity issues.

Seconds Matter: Why Monitoring Website Uptime Alone isn't Enough

It takes 50 milliseconds for visitors to decide whether to bounce from your website, that’s.05 seconds, or about half the time it takes you to blink. In website monitoring we talk a lot about uptime, and while making sure your site returns 200 OK is important, if your load time isn’t instant you’ll lose traffic regardless.

Stay Alert! Building the Coralogix-Nagios Connector

Ask any DevOps engineer, and they will tell you about all the alerts they enable so they can stay informed about their code. These alerts are the first line of defense in the fight for Perfect Uptime SLA. With every good solution out there, you can find plenty of methods for alerting and monitoring events in the code. Each method has its own reasons and logic for how it works and why it’s the best option. But what can you do when you need to connect two opposing methodologies? You innovate!

Sysdig & SUSE: Security & Visibility for SUSE Rancher

Securing a cloud-native environment, such as SUSE Rancher, requires unique considerations. New abstractions like containers, plus the dynamic nature of a Kubernetes orchestrated environment can hamper visibility, especially for legacy tools that aren’t designed for containers and cloud. To help, Sysdig and SUSE have launched a SUSE One Partner Solution Stack designed to not only showcase our joint solution, but also to provide easy ways for you to get started.

Observability into Your FinOps: Taking Distributed Tracing Beyond Monitoring

Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations’ tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years. However, many organizations have yet to realize just how much potential distributed tracing holds.

Proactive Business Line Support for Microsoft Teams PSTN Calls

With Microsoft Teams usage continuing to increase, it was only a matter of time until we also saw the rise of Teams voice through PSTN. Even though this requires more expensive Microsoft 365 licenses, the return on investment can be significant purely based on the pricing structure but sizable savings can be realized when it comes to management overhead costs. However, it can be a challenge to ensure PSTN call quality for all your business lines and users.

Help your business, learn everything about the log and event correlator

Pandora FMs alerting system allows building alerts based not only on the value received by modules,but also in received events or in the information collected by the log collector. We will see how it is possible to build simple alerts or more complex ones, based on a set of logic rules about said events or logs. We present the correlation alert system.

6 Rollbar Alternatives to Consider in 2021

Error-free apps are envied by all software teams today. But building and maintaining such an app is not as easy as it looks. You need to constantly keep a check on your app to see when it faces exceptions or errors. This is why we have so many error tracking tools in the current market. Rollbar is a popular error monitoring tool used for tracking and fixing all types of bugs and errors in modern applications. However, it has shortcomings too.

Observability - An Ultimate Guide

A developers perspective is different. While managing various sectors in a software, sometimes it would be difficult to monitor the activities and identify the bug that is disrupting the functions. What if you can spot the error beforehand, and resolve it at the earliest? The strategies that we focus on, and implement are the ones that help us effectively manage our tasks. That is possible by knowing about Observability. Let's learn in detail about it through this blog. TABLE OF CONTENTS.

Logs and tracing: not just for production, local development too

We're a small team of engineers right now, but each engineer has experience working at companies who invested heavily in observability. While we can't afford months of time dedicated to our tooling, we want to come as close as possible to what we know is good, while running as little as we can- ideally buying, not building. Even with these constraints, we've been surprised at just how good we've managed to get our setup.

Incident Review - Rolling Comcast Outage Disrupts Work from Home for Millions of Users Across the U.S.

The rolling Comcast outage on Monday, November 8th and Tuesday, November 9th affected customers across the U.S., knocking users offline around the country. The first wave took place Monday evening in the San Francisco Bay area. The second, which had a wider geographic impact, occurred Tuesday morning, primarily affecting broad swathes of the Midwest, Southeast, and East Coast.

Serverless Observability: It's easier than you think!

Observability is a measure of how well the internal state of a system can be inferred from its external outputs. It helps us understand what is happening in our application and troubleshoot problems when they arise. It’s an essential part of running production workloads and providing a reliable service that attracts and retains satisfied customers.

Why and how to monitor AWS EKS

Amazon Elastic Kubernetes Service, or EKS, is a managed Kubernetes service. That means that Amazon Web Services (AWS) handles some of the deployment and management tasks for users. But the fact that EKS is a managed service doesn’t mean that AWS manages all administrative tasks. One key management task that isn’t fully covered as part of EKS is monitoring.

Website Performance Monitoring and Optimization Best Practices

Websites are a must-have for any business that wants to survive in a highly competitive environment. Many people mistakenly think that only e-commerce projects need a website, but this is not the case. Absolutely every company needs website performance monitoring and virtually every initiative should be armed with its own webpage. But this article is not about why you need a website, but about how to track and manage its performance.

InfluxDB and GeoData - Emergency Generators

With the widespread use of LTE (Long Term Evolution), we are seeing more IoT devices come online in remote regions of our planet. Picture this scenario: A country is currently experiencing a national emergency due to an electrical grid failure. To mitigate the power shortage the government has deployed generators in the remote regions of their country to power the most remote villages. The problem? The villages are still reporting outages due to the emergency generators running out of fuel.

Discover VirtualMetric's Dashboard Playlists

VirtualMetric Real-time Monitoring Suite is supporting Dashboard Playlists, providing full visibility over your infrastructure and virtualization monitoring for your whole team. Monitoring a complex infrastructure is a hard task. We make it simple thanks to our Dashboard and a bunch of functionalities focused on your user experience, time-saving and improving the efficiency of your team.

Monitor Azure Government with Datadog

Azure Government is a dedicated cloud for public sector organizations that want to leverage Azure’s suite of services in their highly regulated environments. As these organizations migrate their applications to Azure Government, they need to ensure that they can maintain visibility into the status and health of their entire infrastructure.

An overview of key .NET 6 features

.NET 6 is finally here, giving us a new long term stable version of.NET Core. .NET 6 succeeds.NET 5, which was generally seen as a “skip version” by most of us, getting limited use compared to.NET Core 3.1. With this release, we get updates to both the runtime and the C# language. In this post, we’re taking a closer look at what we see as three of the most useful.NET 6 features.

Building a unified analytics platform for ServiceDesk Plus, Desktop Central and OpManager

It's no secret that data-driven IT management can put your organization light years ahead of its peers. The challenge is, how can you bring in data from several IT applications and services into one console for analysis? Analytics Plus, ManageEngine's flagship IT analytics software, has introduced blended analytics that enables users to automatically blend data from ServiceDesk Plus, Desktop Central, and OpManager, and gain a unified view of the IT environment.

Synthetic Testing and Real User Monitoring

Synthetic Testing and Real User Monitoring are the most important tools in your performance toolbox. But they do different things and are useful at different times and many developers only spend time mastering one of these tools and only see a part of their performance problems, like trying to hammer in a screw. Let’s look at these tools, what they measure, and when to use them.

Loki 2.4 is easier to run with a new simplified deployment model

Loki 2.4 is here! It comes with a very long list of cool new features, but there are a couple things I really want to focus on here. Be sure to check out the full release notes and of course the upgrade guide to get all the latest info about upgrading Loki. Also check out our ObservabilityCON 2021 session Why Loki is easier to use and operate than ever before.

Grafana Tempo 1.2 released: New features make monitoring traces 2x more efficient

Grafana Tempo 1.2 has been released! Among other things, we are proud to present both our first version to support search and the most performant version of Tempo ever released. There are also some minor breaking changes so make sure to check those out below. If you want ALL the details you can always check out the v1.2 changelog, but if that’s too much, this post will cover all the big ticket items.

Enabling SRE best practices: new contextual traces in Cloud Logging

The need for relevant and contextual telemetry data to support online services has grown in the last decade as businesses undergo digital transformation. These data are typically the difference between proactively remediating application performance issues or costly service downtime. Distributed tracing is a key capability for improving application performance and reliability, as noted in SRE best practices.

Network AF, Episode 5: Building relationships as an internet analyst with Doug Madory

Network AF welcomes Doug Madory to the podcast. Doug is a veteran, a researcher, a writer and Kentik’s director of internet analysis. With his start in the U.S. Air Force within its Information War Center, Doug has now been working in the networking industry for 12 years. After the Air Force, Doug went on to work for Renesys, which was acquired by Dyn, which was later acquired by Oracle.

Icinga Customer Story: Deutsche Telekom IT

We are proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we’re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

Epsagon-to-Lumigo: a step-by-step migration guide

At Lumigo. we believe in serverless technology, and our mission is to make serverless development easy and fast. For the past few months, we’ve been extending our observability and debugging capabilities, making it a breeze for developers to understand the end-to-end story of every request that goes through the system, find the root causes of issues and be able to easily address them.

Incident Resolution: Do You Remember, the Twenty Fires of September?

From September to early October, Honeycomb declared five public incidents. Internally, the whole month was part of a broader operational burden, where over 20 different issues interrupted normal work. A fraction of them had noticeable public impact, but most of the operational work was invisible. Because we’re all about helping everyone learn from our experiences, we decided to share the behind-the-scenes look of what happened.

Smarter IT Operations Through Actionable Insight

Business leaders talk excitedly about "digital transformation" and "innovative customer experience," but it falls on the shoulders of IT operations to make sure everything actually works. As transformation takes hold, IT teams manage increasingly complex, hybrid, and distributed environments – often comprising traditional on-premises systems and modern infrastructures made up of containers, multiple clouds, and virtualized networks.

Unlock Value from Your Data Anywhere: Connected Experiences .conf21 Highlights

Are you ready to unlock value from your Splunk data, anywhere at any time? You might be itching to do this after seeing the amazing announcements made by the Splunk Connected Experiences team at.conf21. For those that might have missed it — or those that are hoping to learn more — we’ve rounded up the highlights below. Across each of the products, the takeaway is clear: we’re continuing to make it easier than ever to access your Splunk data in new and innovative ways.

Cloud Infrastructure Without the Headaches

Cloud infrastructures have introduced increasing levels of complexity—you have to manage workloads across on-premises, private, and multiple public cloud environments. This requires you to migrate efficiently, optimize effectively, and stay rightsized on an ongoing basis, all while meeting evolving business requirements. With so many moving parts, it can be a massive challenge with lots of pitfalls that can cost you time and money and even put your business results in jeopardy.

TL;DR InfluxDB Tech Tips: API Invokable Scripts in InfluxDB Cloud

If you’re familiar with InfluxDB Cloud, then you’re probably familiar with Flux already. Flux enables you to transform your data in any way you need and write custom tasks, checks, and notification rules. But what you might not know is that InfluxDB Cloud now supports API Invokable Scripts in Flux.

What Is Server Monitoring?

Your IT infrastructure runs on servers, which makes them vital to the performance of your entire IT environment. therefore, it is essential to monitor your servers to ensure there isn’t any disruption in performance and uptime. Servers are devices or applications that can store, process and deploy resources to other devices, applications or users. Now that you know how important servers are to an IT environment, what happens if a server stops working?

Introducing Grafana Enterprise Traces, joining metrics and logs in the Grafana Enterprise Stack observability solution

Today, we are launching a new Grafana Labs product, Grafana Enterprise Traces. Powered by Grafana Tempo, our open source distributed tracing backend,.and built by the maintainers of the project, this offering is an exciting addition to our growing self-managed observability stack tailored for enterprises.

Announcing Grafana OnCall, the easiest way to do on-call management

A critical part of managing modern software development is setting up and running an on-call rotation. But that often involves significant toil, in part because many of the existing tools are cumbersome and not developer-friendly. That’s why we’re excited to announce Grafana OnCall, an easy-to-use on-call management tool that will help reduce toil in on-call management through simpler workflows and interfaces tailored for devs.

ObservabilityCON 2021: Your guide to the newest announcements from Grafana Labs

This morning during the ObservabilityCON keynote, we announced some of the exciting projects and feature enhancements we’ve been working on for our customers and community. And it doesn’t end there. Throughout the week, we’ll continue to unveil new features, go deeper with live demos, and share our plans to shape the future of observability. With so many new announcements and features to check out, we want to make sure you know where to get more details about these developments.

Goliath Technologies Announces Enhancements to Goliath Performance Monitor

Philadelphia – November 9, 2021 – Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software for hybrid cloud environments, announced today the release of Goliath Performance Monitor 11.9.2. New features include a Citrix NetScaler Module and industry-only VMware Horizon End-User Experience Scorecard.

New Relic vs. Sentry vs. Scout

In the digital economy, software applications have become a primary product for a large number of companies. On top of that, customers expect a flawless user experience from the applications as it evolves. To provide such a great experience, companies need to have powerful performance monitoring across their applications. We will discuss APM tools that are popular in the market right now and compare them in different aspects. Feel free to use these links to navigate the guide.

Limit Coralogix usage per account using Azure Functions

At Payoneer, we use Coralogix to collect logs from all our environments from QA to PROD. Each environment has its own account in Coralogix and thus its own limit. Coralogix price modules are calculated per account. We as a company have our budget per account and we know how much we pay per each one. In case you exceed the number of logs assigned per account you will pay for the “extra” logs. You can see the exact calculation in this link.

What's Wrong With Observability Pricing?

There’s something wrong with the pricing of observability services. Not just because it costs a lot – it certainly does – but also because it’s almost impossible to discern, in many cases, exactly how the costs are calculated. The service itself, the number of users, the number of sources, the analytics, the retention period, and extended data retention, and the engineers on staff who maintain the whole system are all relevant factors that feed into the final expense.

Building Relationships as an Internet Analyst With Doug Madory | Network AF Episode 5

In this week's episode, you'll hear our host Avi and guest Doug Madory's conversation around internet analysis. Doug is the Director of Internet Analysis here at Kentik, with previous experience at Oracle and Dyn in the same role. Today he shares how he got into technology and his career in the Air Force. Doug later dives into what it's like building relationships with the press and working with them as an internet analyst. You'll get to hear about some of the essential stories highlighted throughout his career, as well. Listen now!

Web Performance of the Top 50 E-commerce/Retail Sites in 2021

Many factors have led to the massive increase of online retail sales in 2020 and 2021, including the increasing reliance of online activities from COVID-19. To put the growth into perspective: in 2017 when we last reported on web performance for the world’s leading retail sites, the industry generated around $2.2 trillion in revenue. In 2021, global e-commerce sales are expected to hit $4.921 trillion.

AWS Database Types - Aurora vs DynamoDB vs RDS - How do they compare?

In this blog, we describe the different types of AWS’ managed databases and their various features and merits. By the end of the blog, you should have better information to choose the right AWS database that would match your application’s needs.

How to choose between AWS RDS and EC2 Hosted Database?

You’ve decided to migrate your applications from on-premises to AWS and are considering what cloud services are available that suit your needs the best. When you are migrating an application that uses a relational database backend (RDBMS) such as Oracle, MySQL or SQL Server to the cloud, the question of RDS vs EC2 will inevitably surface.

Icinga for Windows Releasing v1.7.0 - Start your contribution

Today we are happy to announce, that Icinga for Windows v1.7.0 has been released! While this release includes lots of bugfixes for the Framework itself including the basic plugins, our main goal was to increase usability and make access for developers a lot easier.

Microsoft Office 365, Azure, and How to Use Them Together

Office 365 and Azure are two important cloud services with many features and functions. Although Microsoft mainly designed them to work separately, when used in combination, they offer an excellent way to increase efficiency in the workplace with minimal IT administration. This post will focus on the several ways organizations can benefit by using Office 365 and Azure together. It will also discuss critical considerations for administration, best practices, and pitfalls while using them together.

Network Management Services: An Intro Guide With Examples

Connectivity is more valuable to today’s businesses than ever. Partly, this is because many business-critical operations are happening online. Employees are connecting using collaborative software. Customers are seeking support and placing orders online. At the same time, suppliers and partners are transmitting data online. All their success depends on network capacity and reliability.

Game Launches Should Be Exciting for Your Players, Not for Your LiveOps Team

The moment of launching something new at a game studio (titles, experiences, features, subscriptions) is a blockbuster moment that hangs in the balance. The architecture—distributed and complex, designed by a multitude of teams, to be played across a variety of devices in every corner of the world—is about to meet a frenzy of audience anticipation, along with the sky-high expectations of players, executives, and investors.

Anodot Acquires Pileus to Transform the Cloud Cost Optimization Space

We couldn’t be more excited here at Anodot at the announcement of the acquisition of Pileus. Acquiring a company is a very special event, a moment that is the culmination of months of thought and deliberation. Is there a strong synergy between the two entities? Do we share the same DNA and culture? Is the additional product aligned with our long-term vision?

How Sumo Logic monitors unit economics to improve cloud cost-efficiency

An often overlooked aspect of a company’s journey to the cloud is cost visibility. While the single number delivered by the cloud provider on a monthly invoice is straightforward, understanding where this number comes from is often more tricky. Fortunately, this task can be facilitated through the usage of various cost monitoring tools available on the market, coming from both third-party companies and the cloud providers themselves.

Introducing Log Data Restoration on LogDNA

If you’re reading this, I’m pretty sure I don’t need to do much to convince you of the importance of logs. They are the core atomic unit for understanding your environments and provide the insights required to troubleshoot, debug, and more. The fact of the matter is that everyone in your organization needs logs to perform critical functions of their job.

NiCE Linux PowerPC Management Pack 1.20 released

The NiCE Linux PowerPC Management Pack 1.20 is an enterprise-ready Microsoft SCOM add-on for advanced IBM PowerPC on Linux monitoring. It supports Linux PowerPC administrators in centralized health and performance monitoring to improve user experience and business results. The Management Pack provides clear and precise performance indicators and timely alerts enriched by pinpointing problem identification and troubleshooting information.

SCOM 2022: What to expect and what you need to know

Microsoft System Center Operations Management team has been busy with new features and updates for SCOM. The most recent version, SCOM 2019, came out on March 14th, 2019. The release date for the upcoming SCOM 2022 has been announced to be globally available in Q1 2022. This blog post will provide an overview of some of the features you can expect to see in SCOM 2022 and how those changes may affect your experience of SCOM monitoring as a user or customer.

7 Tips for Hyper-V Monitoring that Will Boost Your VM Performance

Hyper-V is one of the most popular virtualization software, especially for Windows systems and servers. However, no software or tool can be optimized to your advantage without proper monitoring. Now, you’re probably already monitoring your Hyper-V environments, but are you doing it the best way? This post will reveal seven important tips that can help reinforce your efforts to Hyper-V monitoring, especially cluster monitoring, which is a hard task.

The 18 most popular data source plugins for Grafana in 2021

As a composable solution, Grafana allows you to bring your data into dashboards natively without having to extract it, load it, or transform it. We believe in a “big tent” philosophy, which allows you to choose the tools that best suit your observability strategy, and with our plugins, Grafana is interoperable with more than 100 data sources.

New Apps for PagerDuty's Datadog Integration

Status Dashboard by PagerDuty and Incidents by PagerDuty are new apps available now in Datadog. See a live, shared view of system health to improve awareness of operational issues with Status Dashboard by PagerDuty. Acknowledge, troubleshoot, and resolve incidents with PagerDuty actions embedded directly in the Datadog interface to limit context switching among tools. Julia Nasser and Hadijah Creary join the stream to show off this powerful enhanced integration.

Vue.JS Live 2021 Workshop: A Different Vue into Web Performance

Solving your front-end performance problems can be hard, but identifying where you have performance problems in the first place can be even harder. In this live workshop held live at Vue.JS 2021, Abhijeet Prasad, software engineer at Sentry.io, dives deep into UX research, browser performance APIs, and developer tools to help show you the reasons why your Vue applications may be slow. He'll help answer questions like, "What does it mean to have a fast website?" and "How do I know if my performance problem is really a problem?". By walking through different example apps, you'll be able to learn how to use and leverage core web vitals, navigation-timing APIs, and distributed tracing to better understand your performance problems.

React Advanced 2021 Workshop: "Building a Sentry: 7 years of open-source React"

Sentry’s Evan Purkhiser and David Wang will walk through Sentry’s 2000+ file Typescript/React codebase at a special workshop that took place at React Advanced 2021. They’ll tell war stories of the good, the bad, and the ugly. Gaze in wonder at their modern usage of Typescript, react hooks, and styled components. Recoil in disgust at their legacy Reflux usage and perplexing Webpack configuration. Step away from the workshop with a working knowledge of techniques to keep a large-scale frontend codebase modern and maintainable.

What is Kubernetes Lens?

As a DevOps Engineer, one day you’re performing magic in the terminal, settling clusters, and feeling like a god. On some other days, you feel like a total fraud and scam. Errors and bugs appear from everywhere, you don’t know where to start, and you don’t know where to look. Sadly, days like this come far too often. To be more specific, what often causes these bad days is none other than Kubernetes itself.

A Comprehensive Guide Of Website Navigation: How You Can Improve Your Site?

Do you want to give your visitors excellent navigation so they can land on their desired page correctly? Easy navigation around a website is of the up-most importance when it comes to website performance. In this guide we will share a detailed guide for you to make organized website navigations on your site.

How to determine the source of SaaS latency

One of the positive things that came out of events in 2020 was that many of us started working from home. At first, it was kind of weird. But once we realized that what we needed was available online, it became easier. All we had to do was figure out a few new apps, like Slack, Asana and Google Docs. Then, after a couple of weeks of working from home, many of us started having thoughts like, “I wonder if I could wear shorts and my favorite slippers?

Proximity Bias is a Serious Concern for Young Workers - How Can You Avoid It?

We’ve all experienced a bit of FOMO at one time or another, whether we stayed home sick the night of a party or failed to score tickets to a big concert. It stings to miss out on the fun, but we get over it. In the era of remote work, however, ‘fear of missing out’ has taken on a more consequential meaning – one that is troubling the minds of many young professionals.

How to Optimize Your Cyber Security and Performance Monitoring Tools Using Load Balancing

The capacity to scale and process high data traffic by monitoring appliances is a critical requirement for organizations aiming to enhance or improve their security and protection from external threats. Excessive incoming traffic demands high-monitoring capabilities as it overwhelms the monitoring tools and places computational bounds that increase exponentially.

Observability in Practice

After years of helping developers monitor and debug their production systems, we couldn’t help but notice a pattern across many of them: they roughly know that metrics and traces should help them get the answers they need, but they are unfamiliar with how metrics and traces work, and how they fit into the bigger observability world. This post is an introduction to how we see observability in practice, and a loose roadmap for exploring observability concepts in the posts to come.

Seven Critical Capabilities to Look for in an AIOps Tool

In 2017, McAfee found that an average enterprise uses 464 custom applications. A large enterprise — a company with over 50,000 employees — uses 788 custom apps! The more applications you have, the more complex your application environment is. This means that you are more susceptible to outages. So, the tolerance for downtime is impossibly low. Mission-critical applications must be available at all times.

The Persistent Threat of Downtime in Banking and How to Solve it

At 8:54 pm on November 1, 2020, a customer of HDFC bank complained on Twitter that the bank’s services like internet banking and ATMs were down. More customers started raising similar issues over the next couple of hours, saying that UPI, credit card, and debit card transactions weren’t working either. Finally, at 11:55 pm, the bank confirmed that one of their data centers faced an outage. “Restoration shouldn’t take long,” they promised.

Strengthen Your Cloud Ops with Preventive Healing

The cloud is driving enterprise digital transformation. Gartner predicts that by 2026, public cloud spending will exceed 45% of all enterprise IT spending, a 2.5x growth from 2021. Enterprises globally are accelerating application modernization, embracing the cloud. This is giving rise to a few key trends. Software-as-a-Service (SaaS) adoption is on the rise. So, organizations are using applications whose implementation/infrastructure they have little or no control over.

Monitoring network security with Aruba Clearpass, Grafana and Graphite

In this article, we will explore why it is imperative to constantly monitor network security metrics, what Aruba Clearpass is, and how it helps us manage network security. Then we will look at what Graphite and Grafana are and how to analyze metrics with their help. Finally, we will learn how MetricFire can make it easier for us to work with Graphite and Grafana.

Dashboard Fridays: Sample JIRA Health dashboard

Join SquaredUp's Adam Kinniburgh and Tim Wheeler as they showcase this example Jira Health Status dashboard. Offering a quick view of the state of your next release, alongside the health of the Jira project as a whole, this dashboard uses SquaredUp’s Web API and PowerShell tiles to improve on Jira’s native dashboarding capabilities. Tune in to learn how it was made, the challenges it solves, and our top tips for building it yourself.

AWS CloudWatch Alarms - A Startup Guide

While more businesses are moving their apps to the cloud, they must also ensure that cloud-based services such as Amazon Web Services (AWS) and other resources remain available. So, how can you make sure these cloud services aren't turned off? You can accomplish this using a tool like Amazon CloudWatch, which monitors applications. What is AWS CloudWatch will be the subject of this article.

AMA Responses: Icinga Web and Modules

00:10 Why are there some issues and PRs that have not been looked at for some time?

01:34 Are there plans to increase the number of people working on the Director?

01:51 Why is there such a discrepancy between the HA functionality in Icinga 2 versus Icinga Web 2 and its modules? And will this improve in the future?

03:17 Will it be possible to tunnel module traffic with the Icinga traffic? Is something planned for managing for example x509 in a distributed setup?

3 Ways Ops Teams Benefit From LM Logs

Sifting through logs in real-time or post-mortem to pinpoint the problem can take hours – and is often like trying to find the needle in the alert/log haystack. Further, keeping the troubleshooting process efficient can be a challenge due to context switching and relying on manual interpretation of events and technology-specific knowledge.

Innovations in cloud network security

Learn about innovations in cloud network security over a global network. This includes Google Cloud innovations released this year from DDoS and Web Application Firewall (WAF), Google Cloud Armor, Google Cloud firewalls, and Google Cloud IDS - the newest network based intrusion detection solution.

The eCommerce Holiday Calendar for DevOps

Seasonal spikes in consumer activity are expected, if not depended on, by online retailers throughout the calendar year. However, as shoppers rush to compete over door-buster deals and order holiday must-haves, web traffic escalates to levels standard resource allocation cannot easily sustain. This spike in traffic can lead to unresponsive checkouts, lost or abandoned carts, and slow-loading pages, ultimately resulting in thousands of dollars in lost revenue.

Updated ELK Stack Guide For 2022 (Installation, Tutorials & More)

The ELK Stack has millions of users globally due to its effectiveness for log management, SIEM, alerting, data analytics, e-commerce site search and data visualisation. In this extensive guide (updated for 2021) we cover all of the essential basics you need to know to get started with installing ELK, exploring its most popular use cases and the leading integrations you’ll want to start ingesting your logs and metrics data from.

AWS Fargate Monitoring

How do you perform AWS Fargate monitoring? Today, we’ll discuss the background of AWS Fargate and using Retrace to monitor your code. As companies evolve from a monolithic architecture to microservice architectures, some common challenges often surface that companies must address during the journey. In this post, we’ll discuss one of these challenges: observability and how to do it in AWS Fargate.

Mario vs. Steve: What Video Games Can Teach Us about Monitoring vs. Observability

Credit: Unsplash What is monitoring? What is observability? Monitoring shows you how a Kubernetes environment and all of its layers are operating. Observability, on the other hand, is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

We've been named a Customers' Choice for the third year in a row!

The market has spoken! ManageEngine has been named a Customers’ Choice in the 2021 Gartner Peer Insights ‘Voice of Customer’: Application Performance Monitoring report for the third time in a row. It’s an honor to be trusted and loved by customers all around the globe.

Datadog acquires Ozcode

At Datadog, we believe that having visibility into production is crucial to building better software, especially as modern environments become more and more complex. Bugs that occur in production are often difficult to reproduce locally, which leaves developers guessing about what went wrong. To solve this problem, teams need the same depth of visibility into their production environments as they do into their local environments.

Top key metrics for monitoring MySQL

Monitoring MySQL with Prometheus is easy to do thanks to the MySQL Prometheus Exporter. MySQL doesn’t need an introduction – it’s one of the most used relational databases in the world, and it’s also open-source! Being such a popular database means that the community behind it is also huge. So don’t worry: you won’t be alone.

A Look at the 10 Best Error Monitoring Tools in 2021

Due to the growing usage of online methods for almost everything globally, technology has become a part of our daily lives. Most people around the world are using mobile apps and websites. Most of the companies are already adopting brand-new tech stacks faster. They are trying hard to implement all the features and make their system robust and powerful. But it is not easy to make an application error-free. It is impossible to track all the issues, reproduce them, and fix them before shipping code.

Sensitive Data Management - Working With Sensitive Data in Regulated Industries

Let’s face it—the harsh reality, seen often in news stories today, is that security breaches are going to happen. In today’s world of cybercriminals and nation-state attackers, it’s not a question of “if” but “when” your organization is attacked. If you don’t take appropriate steps to ensure sensitive data is protected properly, it could devastate the organization financially.

Icinga DB Housekeeping

We all know that the history data is important in monitoring. But this history data becomes obsolete over time and those records become garbage which would only fill up space. So it is important to remove obsolete history records to free up space. We call this process housekeeping. This needs to be performed periodically to cleanup the history records whenever they exceed their maximum age and become obsolete.

Eight Reasons Enterprise Automation is Vital to Cloud Strategy Success

The cloud and Electric Vehicles (EVs) have a lot in common. Both are modern, fast, and agile. Both are also in great demand. Every street seems to have an EV parked somewhere. It’s the same with the cloud, which is fast becoming the platform of choice to power enterprise applications. Whether it is public, private, or hybrid, the cloud offers flexibility, security, and low total cost of ownership.

Announcing New Splunk Infrastructure Monitoring Capabilities

This year at.conf21, we announced exciting new features in Splunk Infrastructure Monitoring, our real-time streaming metrics-based monitoring platform. Our innovations help SRE and cloud operations teams detect and resolve performance issues even more quickly and efficiently while maintaining enterprise-grade security and compliance posture. In this roundup blog, we cover, in detail, all the product features we unveiled at.conf21.

The Lights Never Go Out For Financial Services

The financial sector is in a constant state of change, and institutions must continually adjust to the evolving needs of their customers and the global marketplace. And with availability requirements like the ones European Banking Authority (EBA) provide—99.5%—financial institutions are driven to remain on the bleeding edge of responsiveness and convenience. Banking outages lead to lost consumer confidence and damaged reputations in a sector where trust is the cornerstone of doing business.

Build a modern data compliance strategy with Datadog's Sensitive Data Scanner

Within distributed applications, data moves across many loosely connected endpoints, microservices, and teams, making it difficult to know when services are storing—or inadvertently leaking—sensitive data. This is especially true for governance, risk management, and compliance (GRC) or other security teams working for enterprises in highly regulated industries, such as healthcare, banking, insurance, and financial services.

Ecommerce monitoring strategy for Black Friday and Cyber Monday 2021

It’s nearly here. The annual mad rush at the wee hours of the morning. The stampede into retail stores to claim really deep discounts on the latest toys, electronics, and gadgets makes headline news every year. It begins the day after Thanksgiving and is usually two of the biggest shopping days of the year. Yes, we’re talking about Black Friday and Cyber Monday.

What can you learn from IoT with i2M - Part 3

In the last 2 installments (Part 1 & Part 2), we discussed the basics of IoT and an example of how the components can be connected and used to provide basic automation and alerting. These seemingly simple steps can build up to provide very advanced controls of all aspects of the physical world. The challenge can become managing situations that were not expected.

Top 5 Challenges of Monitoring Complex IT Infrastructures

When monitoring a large IT infrastructure, there are multiple aspects you need to keep under control. Doing things manually and relying on people to ensure the infrastructure reliability can be a wrong decision and mislead you when resolving issues or troubleshooting problems. All these complexities faced while managing a large ecosystem can seem hard to overcome, but in reality, they can be handled.

Best practices for Cloud Operations in the enterprise

How can you get the most value out of Cloud Operations, especially as your Cloud footprint grows? In this episode of Engineering for Reliability, we look at the enterprise best practices for setting up and using Cloud Operations. Watch to learn how to improve the security of your services, better manage capacity, and keep your users happy!

NGINX Monitoring: Best Tools and Key Metrics You Should Know About

NGINX is a popular web server featuring a wide range of capabilities, including reverse proxy, mail proxy, HTTP cache, and load balancing. It offers TLS offloading and a health check of the backends and supports gRPC, WebSocket, and HTTP/2. In short, NGINX is a one-stop solution for most of your web server needs. When using NGINX, monitoring its metrics is crucial for tackling issues.

Rollbar Pro Tips: Slack Best Practices

Deliver a better alerting experience for you and your team by utilizing Slack best practices and customizing your alerts. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Rollbar Pro Tips: UUIDs

The Rollbar API supports including a UUID with occurrence reports which can later be used to look up that exact occurrence. Learn how you can create a browser bookmark to quickly open an error sent to Rollbar, and how to configure Rollbar to display the error UUID back to the web page. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Crossed 5000+ GitHub stars, metrics generation from spans - SigNal 06

Welcome back to SigNal #6. Every month, SigNal from SigNoz gives you an update on what we've been building, shipping, and iterating. In the midst of sprint plannings, feature releases, and bug fixes, time just flew by, and here we are, with another monthly product update! Also, we crossed 5000+ GitHub stars this month.

Evolving from Splunk IT Essentials to Predictive IT Management with Splunk IT Service Intelligence

This video describes the value Splunk IT Essentials Work provides organizations that are just getting started with using Splunk for common IT tasks and use cases, as well as the benefits of quickly evolving to predictive IT and service management with Splunk IT Service Intelligence when you’re ready.

How sparse histograms can improve efficiency, precision, and mergeability in Prometheus TSDB

Grafana Labs recently hosted its first company-wide hackathon, and we joined forces with Björn “Beorn” Rabenstein to bring sparse high-resolution histograms in Prometheus TSDB into a working prototype. The Prometheus TSDB has gained experimental support to store and retrieve these new sparse high-resolution histograms. At PromCon 2021, we presented our exciting, fresh-off-the-presses results from the ongoing project.

What's New in OpenTelemetry: Community, Distributions, and Roadmap

I am honored to be able to talk about Splunk’s investment and commitment to the OpenTelemetry project. I would like to take this opportunity to talk about the latest in the OpenTelemetry community, as well as the instrumentation and data collection distributions available from Splunk. Be sure to read through the whole post, as you will find some roadmap information too!

Who needs CMMC certification (Resource Guide for 2022)

If your company works with the US Department of Defense (DoD) as a contractor or subcontractor, you will need to prepare to meet CMMC requirements in order to successfully bid on and win contracts. This recent development has been a significant adjustment for small organisations who wish to work with or continue working with the DoD.

How to Get Started with JavaScript and InfluxDB

This article was written by Nicolas Bohorquez and was originally published in The New Stack. Scroll below for the author’s picture and bio. Telegraf is the preferred way to collect data for InfluxDB. Though in some use cases, client libraries are better, such as when parsing a stream of server-side events. In this tutorial, you’ll learn how to read a data stream, store it as a time series into InfluxDB and run queries over the data using InfluxDB’s JavaScript client library.

A Study in Graylog's Versatility

Recently, I explored the case for Graylog as an outstanding means of aggregating the specialized training data needed to build a successful, customized artificial intelligence (AI) project. Well, that’s true, of course. My larger point, though, was that Graylog is a powerful and flexible solution applicable to a very broad range of use cases (of which AI development is just one).

Understanding why businesses require network topology software

It is crucial for network admins to fully understand their network topology. Even basic troubleshooting can be needlessly complicated without a network topology diagram which is vital for building and maintaining a network. A network topology diagram shows how the various components work together; it shows the devices, connections and pathways of a network visually so you can figure out how devices interact and communicate with one another.

Sponsored Post

Hybrid Multi-Cloud Demands Holistic Observability

As I said before, Speed is King. Business requirements for applications and architecture change all the time, driven by changes in customer needs, competition, and innovation and this only seems to be accelerating. Application developers must not be the blocker to business. We need business changes at the speed of life, not at the speed of software development.

Sponsored Post

Alerting has landed: Never miss another mission-critical issue again

Time is of the essence when identifying and resolving issues in your software. The longer it takes for a fix to be deployed, the greater the consequences for your customers. Visibility and speed are core to what makes Raygun powerful and is why today we're excited to continue this journey with our latest feature - Alerting.

Sponsored Post

Africa: DNS Provider Performance Checkup

In our last blog, we spotlighted the Oceania region and showed you how you can use DNSPerf and PerfOps to compare providers. Today, we’ll be focusing on Africa, the top providers there, and how you can use this data to make better DNS business decisions for your domain. This will give you greater insight into your current or potential provider’s performance. You should have the fastest, highest-quality DNS for regions you cater to most.

Introducing Log Observability for Microservices

Two popular deployment architectures exist in software: the out-of-favor monolithic architecture and the newly popular microservices architecture. Monolithic architectures were quite popular in the past, with almost all companies adopting them. As time went on, the drawbacks of these systems drove companies to rework entire systems to use microservices instead.

6 Bugsnag Alternatives to Consider in 2021

Modern customers demand that their applications are as seamless and error-free as possible. However, building such apps is a herculean task in itself. You need to constantly look out for incoming exceptions and warnings in your app in production. Effective error monitoring is key to resolving such issues before they are discovered by your users and cause a disruption in the quality of your services.

REST vs CRUD

CRUD and REST are two of the most popular concepts in the Application Program Interface (API) industry. REST was made to standardize the HTTP protocol interface between clients and servers and is one of the widely used design styles for web API. On the other hand, CRUD is an acronym used to refer to the four basic operations executed on database applications. Because both work on manipulating databases’ data, it’s easy to see why people have some confusion between them.

Introducing Google Cloud Managed Service for Prometheus

Prometheus is an open-source monitoring system which helps you collect, store, query, and get alerts on metrics that are important to your applications and infrastructure. In this video, we introduce Google Cloud Managed Service for Prometheus which is designed to help you scale your monitoring. Watch to learn how you can configure and manage Prometheus to keep up with the metrics from all of your successful services!

Open Source for Better Observability

Monitoring cloud-native systems is hard. You’ve got highly distributed apps spanning tens and hundreds of nodes, services and instances. You’ve got additional layers and dimensions—not just bare metal and OS, but also node, pod, namespace, deployment version, Kubernetes’ control plane and more. To make things more interesting, any typical system these days uses many third-party frameworks, whether open source or cloud services.

Perfect Your Cloud's Deployment with Logz.io & AWS CloudFormation Public Registry

AWS CloudFormation is a service that enables you to create and provision AWS infrastructure deployments predictably and repeatedly. This helps you leverage AWS products such as EC2 instances, Amazon Elastic Block Store, Amazon SNS, Elastic Load Balancing, and Auto Scaling to build highly reliable, highly scalable, cost-effective applications in the cloud – without worrying about creating and configuring the underlying AWS infrastructure.

Introducing new integrations to make it easier to monitor Vault with Grafana

HashiCorp Vault is an increasingly popular multi-cloud security tool that allows users to authenticate and access different clouds, systems, and endpoints, and centrally store, access, and deploy secrets. At Grafana Labs, we’re always looking for ways to make it easy for our community to get started monitoring important parts of their systems. So we’re happy to share some new integrations that will help our users get the most out of Grafana + Vault.

8 Months of OnlineOrNot: From 7 Day MVP to Stable Product

September and October were relatively quiet, so I thought I would write a single article for both months. While I'd normally try to write at least one useful article per month for OnlineOrNot's audience (as well as an update on how the business is going), I wrote no articles, and no code, actually. Instead, I packed up my life in Sydney, Australia, escaped lockdown, and relocated to France with my wife, and just enjoyed living for a while.

Change Happens - Get Alerted

To give you enough notice to fix an issue before it escalates, we’re evolving our alerts and making them more proactive with Change and Crash Rate Alerts. So when your application experiences a change from the norm or a dip in crash-free sessions, Sentry will (smartly) alert you via Slack, Teams, PagerDuty, or old-fashioned email.

What can you learn from IoT with i2M - Part 2

In the first part, I outlined some of the terms associated with the delivery of IoT. Next, let’s look at how this gets complex. You will need to read the state of each sensor (through their appropriate API and through their appropriate vendor-supplied hub), create logic to determine what actions must be taken when certain conditions are met, and then deliver these as a workflow to each responder, and confirm through data collected from sensors that the requested change was implemented.

How Data Types and Query Tuning Can Improve Application Performance

One of the easier ways to improve the performance of your SQL Server and Azure SQL database queries is to ensure you choose the right data types for your data, and the data types in your application’s code match the ones in your stored procedures and queries. Choosing the right data type conserves space, because doing something like choosing a variable character type for data of fixed, regular length like a phone number or national ID number is wasteful.

We raised $29 million in new funding. Here's what we're going to do with it

Today we are announcing an additional $29 million in funding to help Lumigo grow and provide the same powerful observability capabilities we brought to serverless to other cloud-native technologies, including containers and Kubernetes. Lumigo was founded by Aviad Mor and me a few years ago because we believed the world would be rapidly moving to cloud-native architectures and that these technologies are transformative. Our goal was to create the tools that help developers realize this vision.

Office 365 Monitoring: An Introductory Guide

The cloud has transformed the IT world. It’s cost-efficient, scalable, secure, and provides many other benefits. According to techjury, 81% of organizations have at least one application running on the cloud. With such a high number of organizations using the cloud and more joining this list every day, the cloud has become an integral part of many organizations. Cloud typically provides three types of services.

How Secure Tenancy Keeps Your Secrets Secret

The best way to be sure that you keep a secret is not to know it in the first place. Managing secrets is a notoriously difficult engineering problem. Across our industry, secrets are stored in a bewildering variety of secure (and sometimes notoriously insecure) systems of varying complexity. Engineers are often trying to balance the least worst set of tradeoffs. At Honeycomb, we asked: What if we didn’t need to know your secrets to begin with?

New: Optimize Slow Queries with Enhanced Database Visibility in Splunk Observability

Databases have always been the backbone of applications – both web and enterprise. Now, more than ever before, you need to know not just overall statistics about your database, but you must identify how database performance interacts with the network, operating system, servers, configuration, and even third party dependencies.

Dashboard Studio: New Features Highlighted At .conf21

I am very excited that this year’s.conf21was the first.conf where we got to showcase Dashboard Studio, which has come built-in with every Splunk Enterprise and Splunk Cloud Platform release, since 8.2 and 8.1.2103, respectively. I am even more excited to share a packed list of new features in the 8.2.2109 release, which coincides with.conf21! This blog post will highlight a few capability areas we've been heavily focused on that will help you do even more with your dashboards.

Datadog Cloudsmith Integration

Cloudsmith is happy to announce an integration with Datadog to help our customers monitor their Cloudsmith account. Datadog is an observability service for cloud-scale apps, providing monitoring of servers, databases, tools, and services through a SaaS based data analytics platform. At Cloudsmith we are big fans of Datadog and use it to monitor and visualize how our system is performing across a range of services and tools.

The World After Covid-19: How Jobs, Bosses, & Firms May Improve

In 1993 the management guru Peter Drucker argued that “commuting to office work is obsolete.” As of last year, his vision hadn’t quite come true: nearly half of global companies in one survey still prohibited remote working. Then the pandemic hit. Suddenly millions of people started doing their jobs from home. Work will never be the same.

5 Sustainable IT Practices for Your SaaS Applications

With web browser-accessed applications reaching record levels, employees are now spending most of their productive work time inside a cavern of business web applications. These may be custom applications built by a company for specific business purposes, or commercial SaaS applications for important functions such as collaboration, workflow management, scheduling, communication, transactional business, single sign-on, development, service desk, CRM, HR, and others.

How to Parse JSON with Telegraf into InfluxDB Cloud

In Telegraf 1.19 we released a new JSON parser (json_v2). The original parser suffered from an inflexible configuration, and there were a handful of pretty common cases where data could not be parsed. While a lot of edge cases for parsing can be resolved using the Starlark processor, it is still a more advanced approach that requires writing scripts. We have made a lot of enhancements to the new JSON parser that can help you easily read in your JSON data into InfluxDB.

Wi-Fi monitor: 5 ways it helps you get wireless network management right

The increasing need for mobility has accelerated many organizations’ shift towards wireless networks, commonly known as Wi-Fi networks. The high bit rate and bandwidth offered by wireless networks enable a better networking experience than their wired counterparts. In an ideal network, once you set up your Wi-Fi components, your end users should be able to connect and access your network with ease.

Sponsored Post

6 Reasons You Need a Digital Experience Monitoring Strategy

A Digital Experience Monitoring (DEM) strategy unlocks the key to understanding how end-users interact with web and desktop applications. If you have landed at this post, perhaps you are looking for a Digital Experience Monitoring solution. Correct? But before that, let's take a step back in understanding why it's critical to invest in a DEM tool. To provide a better technology experience, operation teams need modern tools to monitor and collect remote worker application insights. And because of that, businesses are adapting their digital transformation strategy to grow, survive, and respond to disruptions caused by the pandemic.

Goliath Technologies Achieves Record First-Half 2021 Growth

Philadelphia, PA – November 1, 2021 – Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software for hybrid cloud environments, announced today that they achieved record revenue and customer growth during the first half of 2021.

My goals as a newly elected OpenTelemetry Governance Committee member

I joined Grafana Labs as a software engineer in October to help build out a team focused on OpenTelemetry, and within a few weeks, I was promptly encouraged to run for a seat on the OpenTelemetry board. Every year, the OpenTelemetry community holds elections for a few seats on the Governance Committee board, which oversees the project at large. The results of this year’s elections are now available, and I am glad to share that I have been elected to serve on the board!

Department of Defense Information Network (DoDIN) Approved Products List (APL): What is it and Why it Matters

This is the second of a four-part security blog series covering why ScienceLogic is listed in the DoDIN APL catalog, what this means for monitoring critical IT infrastructure, and why APL certification is relevant for all organizations. Part two is about what the DoDIN APL is and why it matters to both government and non-government organizations.

Observable Web Applications

Users don’t see your distributed services, cloud architecture, or instrumentation—they only see how the web app is working. Understanding their experience in the client-side is the first step towards understanding the rest of the system. We’ll explore how to make your client-side applications more observable through error tracking, web performance, and usage analytics. With better understanding of real-user experience, you’ll better understand the real behavior of your systems.

Splunk RUM troubleshoots customer-facing issues faster to deliver better user experiences

With Splunk RUM you can quickly navigate from high-level page performance to the source of an issue itself, across your entire architecture for full-stack, end-to-end visibility. A critical component in the Splunk Observability suite, Splunk RUM offers NoSample^TM^, full-fidelity data capture with support for OpenTelemetry - meaning you’ll find root cause without switching monitoring tools, with more context across your distributed systems.

Activating Connected Experiences in 60s

Looking to get started with Splunk Mobile and Connected Experiences? You’ve come to the right place. This video will show you all the steps you need to take to deploy Connected Experiences to your entire organization. If you’re using SAML, no worries. MDM, it works! Come one, come all and learn to unlock the power of Splunk Connected Experiences in your own organization.

New Integrations from Logz.io: November 2021

It’s been a busy couple of months at Logz.io. We’ve added new features, made critical updates, and added a slew of integrations. Those integrations run the gamut from observability and security services, to cloud tools and container orchestration. Let’s take a quick look at what’s new and what’s coming up at Logz.io.

Datadog on Building Responsive UX

Datadog product designers and frontend developers have been working together to create a new, better UX for creating dashboards, which is one of the most important parts of using Datadog. A central part of this effort was building a new layout engine. Working on this project was a bit different from the usual feature work, so the collaboration cycle between our developers and designers had to change for us to more closely and quickly design, build, and test constraints and new ideas in the browser.

What can you learn from IoT with i2M - Part 1

The Internet of Things (IoT) is a wonderful marketing term given to devices that are connected to the internet. Today everything from light switches, air conditioners to door locks have the option of being internet-connected. Now that multiple companies have created “tags” that you can add to anything from keys to cars and packages, anything can be tracked. Across the business, industry, and retail almost every physical component has the option of being internet-connected.

How Freshly is Scaling Business Metrics Observability with AI

Anodot recently took part in the 2021 Data Agility Day, an event dedicated to examining how organizations are extracting value from data. CEO and Co-Founder David Drai was joined by David Ashirov, VP of Data at Freshly, where he has worked to build a data stack that departments across the company could leverage to drive business. Ashirov is a senior executive with two decades of experience in data engineering, business intelligence, and marketing.

Why Use OpenTelemetry Processors to Change Collected Backend Data

When managing distributed environments, we find ourselves challenged with looking for different ways to understand performance better. Telemetry data is critical for solving such a challenge and helping DevOps and IT groups understand these systems’ behavior and performance. To get the most from telemetry data, it has to be captured and analyzed, then tagged to add relevant context, all while being sure to maintain the security and efficiency of user and business data.

More Clouds, More Tools, More Problems

Organizations need tools to manage their infrastructure, which today is expanding beyond the data center to include multiple public clouds. In fact, in a recent survey of hybrid cloud decision makers, we found that the vast majority of respondents (88%) have placed more than one-quarter of their workloads in the public cloud, and 44% indicated that they’re running more than half of their workloads in the public cloud.