Operations | Monitoring | ITSM | DevOps | Cloud

November 2022

Your guide to Kubernetes Dashboards

As a developer, it can become challenging to manage Kubernetes and develop applications simultaneously. That’s why we put together this guide to show you how the Kubernetes Dashboard can help developers overcome this problem and get an overview of the cluster and its workloads. From this, developers can focus more on application development while stressing less on cluster management.

EC2 Instance Types 101: The Definitive Guide For 2022

Yet, the same flexibility that makes EC2 so appealing can also make it complex, confusing, and unnecessarily costly. A good way to understand the compute service is to familiarize yourself with EC2 instance types, and what the best use cases are for each. This guide will cover that and more.

Database Backups - Cloud 66 Demo

Cloud 66 supports two types of backups: managed and unmanaged. Managed backups: Using managed backups has several benefits: The 100 most recent managed backups are kept by default. Unmanaged backups: Unmanaged backups are stored on your local server and are available under `/var/cloud66/backups`. The 10 most recent unmanaged backups are kept by default. We don’t charge for unmanaged backups. What is Cloud 66? Cloud 66 gives you everything you need to build, deploy and maintain your applications on any cloud, without the headache of dealing with “server stuff”.

Database Replications - Cloud 66 Demo

Database replication involves configuring a master and replica database architecture, whereby the replica is an exact copy of the master at all times. This feature is supported for MySQL, PostgreSQL, Redis and MongoDB. Database replication can be set up for a single application, between applications, or between database groups. What is Cloud 66? Cloud 66 gives you everything you need to build, deploy and maintain your applications on any cloud, without the headache of dealing with “server stuff”.

Preview Deployments - Cloud 66 Demo

What are Preview Deployments? Preview Deployments automatically build and deploy a (private) preview version of your application whenever you commit changes to your repo. The preview runs alongside your active application and helps you to quickly test changes to your code without having to deploy to a separate environment. What is Cloud 66? Cloud 66 gives you everything you need to build, deploy and maintain your applications on any cloud, without the headache of dealing with “server stuff”.

Upgrading Databases - Cloud 66 Demo

Cloud 66 will not do in-place database upgrades, because this process may cause your application to stop working or may not be possible automatically. To upgrade your database through Cloud 66, we recommend that you create a new application (at which point Cloud 66 will deploy the newer database version). Once the new application is created, you can migrate data from your old application to your new application.

Traffic Rules - Cloud 66 Demo

Traffic consist of one or more functions that run in a sequence. They run from top to bottom and affect the route that each web request to your application takes. They can also block traffic based on different conditions. If traffic is blocked, it will not run through the rest of the rules that come after the current one. What is Cloud 66? Cloud 66 gives you everything you need to build, deploy and maintain your applications on any cloud, without the headache of dealing with “server stuff”.

Team Features - Cloud 66 Demo

Cloud 66 Team Access Control features: Organizations Organizations are the foundation of team access control. They allow you to manage distinct groups of users, teams, applications and other components. They are useful if you need complete separation between different sets of applications and/or users. You can be a member or owner of several Organizations and switch between them using the top-right menu in the web dashboard or the `--org` option of the Cloud 66 Toolbelt (CLI).

Why You Need Self-Service Infrastructure

Engineering teams’ autonomy and agility are vital in achieving efficient software development. However, manual infrastructure provisioning is a major source of inefficiency and bottleneck. As the developers wait for Ops teams to provision complex infrastructure, they cannot bring the creativity, speed, and agility expected of them. This is the reason successful companies are quickly adopting self-service infrastructure.

Need help choosing an embedded Linux distribution? Get guidance here

Enterprises are looking to capitalise on the new wave of small form-factor computing and navigate the shift to the edge. Device manufacturers across the world are racing to build embedded, connected devices that will deliver on the promise of the fourth industrial revolution. Many of them are looking to explore data-driven value-chain optimisations, predictive maintenance and or new digital customer experiences.

How to Help Teams Create Optimal Infrastructure for Availability

Teams are locked into a cycle of suffering characterized by the feeling that they are sprinting just to stay still. This morale and productivity-destroying state is caused by an inability to find time to save time. Our new research, The State of Availability Report 2022, discovered that teams know what they want to do—harness cloud and DevOps practices and tools to advance digital transformation—but something’s getting in the way.

What Is Developer Experience?

Recently I started hearing more and more about developer experience, which to me was a bit new-ish, so seriously what is developer experience? Another buzz word? A real thing? It’s been a while since I actually developed code … yes yes … you can laugh … a sales person that used to be a developer 😃, and A LOT changed since then. The number of tools a developer should work with on a daily basis is just crazy.

Get in front of delivery risks by managing work in progress

Sleuth’s product team is pleased to announce an exciting new feature that provides early and actionable visibility into emerging work-in-progress risk! With this release, Sleuth provides customers even more actionable visibility into their engineering efficiency. It extends Sleuth's deploy-centric tracking capabilities upstream in the developer workflow to provide real-time visibility into in-flight work and its emerging risks. Here's how it works.

Testing React components with Cypress

Components are reusable bits of code that, most of the time, work and function independently. If you want to be confident that components are working properly, you need to test them. Conveniently, Cypress.io has designed their testing framework to include component testing. This tutorial illustrates the differences between end-to-end (E2E) and component testing, and what to consider when using these methods. Then, you will learn how to use Cypress for component testing.

Enabling digital transformation - and data modernization - with DevOps

I don’t need to highlight the impact of the last few years on the world and its businesses. Companies that once completely dismissed the idea of remote working now embrace online offices, with many now operating fully remotely. Externally, the marketplace is shifting too, and opportunities for creating and realizing value can be found in new, less familiar places.

Kubernetes Health-Check: The Most Critical Health Conditions To Monitor

Kubernetes can generate so many types of new metrics (millions every day) that one of the most complex aspects of monitoring your cluster’s health is filtering through these metrics to decide which ones are important to pay attention to. In fact, in a survey that Circonus conducted of Kubernetes operators, uncertainties around which metrics to collect was one of the top challenges to monitoring that operators face.

Are You One of the 76% Failing in the Cloud? How to Do Kubernetes Right

A recent Wall Street Journal article cited a KPMG survey that showed that roughly 67% of 1,000 senior technology leaders at U.S. firms across industries said they have yet to see a significant return on cloud investments. The most common issues preventing a better return on cloud spending were insufficient skills of tech teams, additional security and compliance requirements, and a misalignment with expected outcomes, said Barry Brunsman, a principal in KPMG’s CIO Advisory group.

Release 1.37.0: Infinite scalability, database tiering, and much more

Another release of the Netdata Monitoring solution is here! We focused on these key areas: IMPORTANT NOTICE This release fixes two security issues, one in streaming authorization and another at the execution of alarm notification commands. All users are advised to update to this version or any later!

What is the Real ROI of an ITIM Tool?

It may seem a bit strange, but recently I’ve been asked about Return on Investment (ROI) calculators a fair amount. Generally the discussion is if a given vendor should (or should not) have an ROI calculator on their website. Some reasons for justifying an ROI calculator include making it easier for potential clients to understand the value they will receive from a solution and the basic cost involved.

Hybrid cloud needs a strategic mindset and flexibility

Hybrid cloud is an increasingly popular strategy for organisations of every hue. According to a recent survey, 72% of respondents put their cloud strategy as being hybrid first. Hybrid cloud offers businesses the best of both worlds: the security, data control and reliability of the private cloud combined with the flexibility, elasticity, cost-effectiveness, and scalability of the public cloud.

Golden signals in seconds with Universal Service Monitoring

Whether you are a site reliability engineer, DevOps engineer, or application developer, you need visibility into the health and performance of every service you run or support. But in complex, dynamic environments, it can be difficult to ensure that all services are accounted for.

ITOM as Key to DevOps Acceleration

In an increasingly fast-paced and complex business world, the continuously disruptive digital channels and mobile devices evolution and competition mean organizations must adapt to survive. For example, technology leaders must wrap their heads around DevOps acceleration to gain an advantage and expand their software market opportunities.

Slash MTTR, avoid costly downtime with improved cross-team Collaboration.

Every second counts when IT teams are called upon to resolve business impacting issues. In modern enterprises, poor communication, fragmented toolchains and spiralling IT complexity can conspire to slow down incident response, putting service availability and ultimately customer satisfaction in peril.

Just Maintaining Availability? Try Building Stability

Today’s customers see availability as a given. What do they really want? Bigger, better technology with new features and faster platforms. But, according to our recently released Moogsoft State of Availability Report, teams burn their time, money and energy on incident management. In fact, engineers overwhelmingly report that incident management takes up most of their time.

Monitor your mobile apps with Embrace's offering in the Datadog Marketplace

Embrace is a mobile application monitoring solution that helps you track and troubleshoot mobile app performance by combining data analytics, real user monitoring, network performance monitoring, and hardware monitoring in a single platform. We’re pleased to partner with Embrace to offer an out-of-the-box Embrace Datadog app and software license in the Datadog Marketplace.

Announcing TISAX-compliant observability for the automotive industry and its suppliers

Many organizations face complex regulatory requirements when it comes to monitoring the health and performance of their service and application infrastructure. As part of our ongoing commitment to providing a comprehensive monitoring solution for all customers, we’re pleased to announce that Datadog has achieved TISAX Assessment Level 2 (AL2) certification.

Improve your EC2 rightsizing recommendations with Datadog and AWS Compute Optimizer

While cloud solutions can give you greater flexibility as you scale your infrastructure, limited visibility into resource utilization makes provisioning the right amount of compute resources challenging. To ensure that every workload is fully supported, many organizations may opt to over-provision, which leads to overspending. Or, in an attempt to maximize cost savings, organizations may under-provision, leaving workloads unsupported and risking serious performance impacts.

Make use of your service data with the Query Builder

The service catalog is an indispensable component of a team’s software development infrastructure. Anything you need to know about your microservice architecture - whether it is knowing who owns a particular service or what another service’s dependencies are - lives inside this repository. Its potential, however, is not limited to being a storehouse for all the data about your microservices.

What is Azure Blob?

Today, data is invaluable. Many businesses store large amounts of it for everyday operations. Some information is stored in a hierarchical form, but companies must often hold significant amounts of data without any organization. The solution? Microsoft Azure’s Blob Storage. In this article, we’ll consider the key elements of Blob Storage, how it works, and why it’s important.

Fleet Introduces OCI Support for Helm Charts

Rancher, the open source container management platform, uses Fleet to enable its continuous deployment features. Fleet brings GitOps functionality to Rancher. Fleet in Rancher 2.7.0 can fetch Helm charts from OCI registries. Using OCI registries to store Helm charts is an increasingly popular storage method. It allows storing your charts in a registry alongside your container images. This unifies the storage options for charts and reduces friction. Using a chart in an OCI registry is fairly simple.

Installing and Running Kubewarden In Air-Gapped Environments

We are excited to announce that deploying Kubewarden in air gap environments has been simplified and documented! For that, you will need a private OCI registry accessible by your Kubernetes cluster. If you’re unfamiliar with Kubewarden, it’s a policy engine for Kubernetes. Its mission is to simplify the adoption of policy-as-code. Kubewarden policies are WebAssembly modules; therefore they can be stored inside an OCI-compliant registry as OCI artifacts.

Everything I Wanted To Know About Kubernetes Autoscaling

Kubernetes is today the most well-known container scheduler used by thousands of companies. Being able to quickly and automatically scale your application is something standard nowadays. However, knowing how to do it well is another topic. In this article, we'll cover how pod autoscaling works, how it can be used, when it's exciting and not, and finally, we'll cover it with a Qovery usage we have internally.

SQL Server Monitoring: What metrics to track

SQL Server Monitoring has become an essential part of modern-day applications since a major chunk of these applications rely heavily on a database. It is therefore important to monitor your metrics and make the best out of your database services. SQL Server Monitoring offers plenty of metrics to choose from. We will be breaking down the five key categories that an SQL server provides for a comprehensive view of their functionality.

The 4 Most Exciting Features Coming For January 2023

What a year. 2022 was crazy for our product team. So many features have been released, like our new open-source web console, our Terraform Provider, RBAC, and Container Deployment... I can't even list all the things we have delivered. You should check out our changelogs, updated every 2 weeks, to see how crazy the pace was. Now, it's the perfect time to announce the most expected and exciting features coming for January 2023. Let's go!

Takeaways from the Kubernetes state of play 2022 Report

As Kubernetes becomes increasingly integrated across IT environments, organizations are growing more ambitious in how they use the technology, building established use cases like infrastructure management and microservices into new and ambitious fields like machine learning and edge computing. Is Kubernetes ready for this new era? What obstacles still lie in the way that risk slowing growth? Our Kubernetes State of Play for 2022 sought to answer these questions and more.

Multi-cloud trends in the healthcare sector

In the last few years, driven in no small part by the impact of the pandemic, cloud technology has had a profound effect on healthcare. Despite being one of the last sectors to go all-in on public cloud, the maturity of offerings seems to be finally winning the industry over, opening up significant opportunities for telemedicine and virtual care to adapt to evolving patient and workforce needs.

How Finance Can Instill An ROI Mindset In Engineering

Conventional wisdom states that SaaS engineers don’t care about costs. They care about building an optimal product, regardless of the dollar signs associated with it. As a result, the finance team feels they must corral the engineers’ efforts and constrain them to work within an agreed budget. In fact, the FinOps Foundation’s annual survey consistently ranks “getting engineers to take action on cost optimization” as the number one challenge experienced by FinOps specialists.

4 Most Common Website Security Threats (2023) + Solutions

For infrastructure administrators tasked with ensuring the reliable operation of their applications, the thought of a lurking cyberattack can be one to lose sleep over. An attack on your system and the services you provide could render your applications unresponsive, resulting in a security breach, or loss of data.

If Modified Henry Ford Were Your CIO, You'd Be Crushing It in the Cloud

I was hesitant to write a piece on Henry Ford because of his unsavory political views that have tarnished his reputation. However, if we focus on Ford only as a businessman and innovator we can draw some compelling parallels to today’s industrial and business landscape. Thus, for this article we will postulate a modified Henry Ford, a virtual character who possesses only Henry Ford’s positive qualities.

Track and triage errors in your logs with Datadog Error Tracking

Reducing noise in your error logs is critical for quickly identifying bugs in your code and determining which to prioritize for remediation. To help you spot and investigate the issues causing error logs in your environments, we’re pleased to announce that Datadog Error Tracking is now available for Log Management in open beta.

Are your websites ready to handle traffic peaks this holiday season?

The final 4 months of the year are an incredibly busy time for retailers, as holiday sales are in full swing from early October and continue all the way into the new year. That’s a whole lot of time for increased holiday web traffic to cause performance issues, and during the busiest sales period of the year - there’s no time to lose.

What I learned from developing a GitLab support feature for CircleCI

Earlier this year, CircleCI added GitLab as the third version control system that we support, in addition to GitHub and Bitbucket. At CircleCI, it’s vital that we meet our users where they are, and many of our users are on GitLab. We were happy to make it possible for our users to build, test, and deploy via the GitLab platform.

Build a CI powered RESTful API with Laravel

When it comes to building RESTful APIs, PHP’s open source Laravel framework remains a top 5 backend framework for web development. Laravel also makes testing your API endpoints a breeze by providing an easy-to-use testing suite. In this post, we will build a token-based authentication API with Laravel, write tests for the endpoints, and automate the build and testing process with CircleCI.

How to Choose the Best Remote IT Support Software

Remote IT support software is an essential tool for supporting customers remotely from anywhere in the world. There are a number of factors that contribute to being able to choose the best solution to provide on-demand support. A good place to start is identifying your main purpose for the remote support solution. For instance, you can pinpoint if you need to remotely control an unattended device or support your customers who require privacy and guidance without directly controlling their devices.

Sponsored Post

Is AIOps Bad for Your Business?

With advances in the field of IT, the amount of data needed to manage IT Operations has grown. In particular with more complex environments, such as the SaaS world, the amounts of data and raw data needed to manage operations have grown exponentially. Managing data manually has become a waste of professionals' skill sets, which could be better used in analyzing and applying the conclusions drawn from the raw data, and not dealing with basic issues that may arise.

Seeing vs. Understanding - The Power of Trace Visualization

It’s common in our everyday language to conflate seeing and understanding when the two are actually very different things. For example, if every day for the last few years we spoke briefly and wrote down the total number of Covid cases in the world, it would be easy to see some trends in the data—you would see the data. But if we present the same data drawn as a chart, it’s easy to understand where the spikes and dips are and when the situation got really bad.

5 Best Practices to Implement Cloud-Native DevOps

As the world transitions to cloud-native offerings as an industry norm, DevOps is gaining traction for its critical role supporting more efficient IT infrastructure. DevOps is designed to boost collaboration and communication by streamlining the automation process to expedite the creation and deployment of applications. Implementing cloud-native DevOps requires organizations make a massive cultural shift.

Postmark + Squadcast Integration: Simplifying Alert Routing

Postmark is a simple email delivery system used to send transactional and marketing emails and it ensures getting them delivered to the inbox on time, every time. It also helps in reducing email delivery time considerably. If you use Postmark for your email delivery requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Postmark to the right users in Squadcast. The below steps will help you set up Postmark and Squadcast integration.

Using ClickHouse with MetricFire

In the analytics domain, fast and reliable storage is an important aspect for businesses to handle a large amount of data. There are different types of data storage including RDBMS, NoSQL, data lake, data warehouse, and graph database. Among these, the most widely used is RDBMS that powers various systems and applications of companies of all sizes. RDBMS is easy to use and straightforward to understand thanks to its table-based (or column-based) data format.

StackState Named Market Leader by Research in Action

Earlier this year, StackState was named a Market Leader in the “2022 Research in Action (RIA) Vendor Selection Matrix (VSM) for Observability.” This is great recognition of the innovative path that we are on. We have focused on topology-powered observability, supported by our unique 4T® Data Model.

Sponsored Post

Unify Your Incident Management Process With the Fundamentals

In a perfect world, technology stays on and runs flawlessly. But we all know this isn't the case. Like any organization, xMatters sometimes experiences unplanned incidents. What we can control is how we respond to them. To resolve incidents quickly, it's important to coordinate an organized response.

Qovery ranks as a G2 High Performer 3x in a row

Qovery has been named a High Performer in the Fall 2022 G2 Reports for the third consecutive time. Today, we’re excited to share that Qovery maintains its High Performer status in its category. We are also grateful for our users’ continued support, which has resulted in us receiving 4.8/5 stars and winning three G2 awards in a row.

AI/ML in retail: how the shopping experience has changed

AI/ML is reinventing the reality of many industries, including retail. From brick-and-mortar stores to online marketplaces, retail companies are all increasing their investments in artificial intelligence, in order to gain a competitive advantage, better understand their customers and solve some of their long-lasting problems.

What's new in Rancher 2.7

The Rancher Team are excited to announce the general availability of Rancher v2.7. Rancher v2.7 is a monumental milestone in the lifecycle of Rancher and introduces the ability to be a truly interoperable, extensible platform through the concept of extensions. The extensions now make it possible for users to build extensions on top of Rancher with complete autonomy.

Remote IT Assistance Software: How to Balance Support with Security

While conventional remote IT assistance software is super efficient for instant IT support, it can also come with data privacy and security concerns. So how do you create a balance between providing effective tech support and maintaining privacy and security? If you have customers in highly regulated industries such as healthcare, surveillance security, or industries that involve personal data or information this is an even more crucial question.

Making the Most of CloudWatch Log Insights: 7 Best Practices

Amazon CloudWatch provides Log Insights, a feature that can help you: CloudWatch Log Insights uses a proprietary query language with several basic commands. It provides sample queries for common AWS service log types, as well as query auto-completion. Learn more about CloudWatch Log Insights capabilities and how to use them.

Ephemeral Environments: The Modern Approach for Better and Faster Testing

Tech companies are gradually adopting the modern CI/CD flow that facilitates rapid releases and fast collaboration between team members. Traditional staging environments are being replaced with ephemeral environments because the shared staging environments do not support the culture of fast-release cycles mentioned earlier. As the traditional staging or testing environments are shared, one developer’s feature can cause bugs, making the whole environment unusable.

Integration testing with GitLab CI and Docker

GitLab is a complete DevOps platform that enables enterprises and organizations to deliver software to markets smoothly while ensuring high product quality. Software engineering practices use many testing techniques, from unit tests to integration tests. This article helps you understand software testing with unit and integration tests. It highlights the fundamental differences between unit and integration tests and demonstrates how to write integration tests for your applications.

Grafana Worldmap Panel

Grafana Worldmap is a free-of-cost panel used to display time-series metrics over a world map. Users can choose to visualize their data based on cities, states, countries, or any other segregation they like as long as they have a coordinate for each data point. Each data point comes in the form of circles that vary in size depending on the value of data and can get color-coded as per thresholds.

Unified Observability: The Role of Metrics, Logs, and Traces

There is significant momentum around observability, as detailed in VMware’s 2022 State of Observability report, with almost all respondents stating that observability would benefit their organization. This is further validated by Gartner including observability in their Magic Quadrant for Application Performance Monitoring and Observability report for the first time this year.

Access Bitbucket Cloud repositories more securely with resource-scoped access tokens.

We understand there is a constant tension between the need to keep source code secure, while also enabling tools to integrate with your Source Code Management solution. In line with this, Bitbucket Cloud is introducing the first step in a range of new API security capabilities, designed to give customers fine-grained control over access to their Repositories, Projects, and Workspaces.

How DevOps outsourcing could help you save on basic costs

An outsourced DevOps can make the difference between a day and night in business operations. However, DevOps outsourcing doesn't happen overnight. There are many events that must occur. Still, there are so many ways when it'll be a great move to outsource DevOps development. Well, we need to start with the basics.

Open source and cybersecurity: from prevention to recovery

So you have just installed the latest antivirus and turned on your shiny new firewall. Now your organisation is fully secure, right? The reality is that all the security products in the world will never be able to fully protect your data centre or your business from security threats. Because of the asymmetry between attackers and enterprises, cybersecurity is a problem that can never be solved and is never going away.

HAProxyConf 2022 Recap

Earlier this month, I packed my bag and grabbed my lanyard to attend HAProxyConf in the beautiful city of Paris. I was far from alone: nearly 200 customers, colleagues, and community enthusiasts had the same idea. We overcame jetlag, stagefright, and introverted personality types for the joy of meeting and learning from some of the brightest minds in the industry. From load balancing to cybersecurity to endless cups of strong coffee: this is HAProxyConf 2022!

Redis: Open Source vs. Enterprise

Are you curious about the difference between open-source Redis and Redis enterprise? Of course, Redis Enterprise is a hosted service that runs Redis db on behalf of its customers, while open-source Redis is available for anyone to use. However, there's also a key difference between open source and enterprise in how the clusters are implemented. In order to understand the difference, we need to know what Redis Clusters are. ‍

What is a bucket in GCP? GCP buckets explained

Google Cloud Provider (GCP) Storage uses buckets to store data. In GCP Storage, you can manage files and folders using the same tools and APIs you use to manage files in a standard container. Using GCP buckets, you can store any type of file, photo, video, or even projects. Essentially, GCP buckets are logical containers for your data. There is no limit to the number of buckets you can create; each bucket can hold any amount of data.

The ultimate generation of our Dedicated offering is here!

We are delighted to announce the general availability of Dedicated Generation 3 - the latest evolution of our Dedicated offering. We rewrote it from scratch with the objective of making the Dedicated experience as incredible as on the Grid, while providing the same strong guarantees regarding uptime, compliance, and SLAs. And, not to humblebrag, but we succeeded!

JavaScript immediately invoked function expressions

JavaScript Immediately Invoked Function Expressions (IIFEs) are functions that are executed when they are initialized. An IIFE (pronounced “iffy”) can be initialized or defined to achieve a certain purpose. In this tutorial, you will learn about use cases for IIFEs and the benefits of using them over traditional functions. You will also write tests for your functions and integrate CI/CD for these tests.

2022 Intellyx Digital Innovator Award

Intellyx, the first and only analyst firm dedicated to digital transformation, today announced that Speedscale has won the 2022 Digital Innovator Award. This is Speedscale’s second year winning the award. As an industry analyst firm that focuses on enterprise digital transformation and the leading edge vendors that are driving it, Intellyx interacts with numerous innovators in the enterprise IT marketplace.

Empowering developers in financial services with desktop as a service

The pandemic has accelerated the trend toward remote working environments but it also pushed governance and security issues to the top of the priority list for IT departments within financial institutions. Employees, and developers in particular, need the technological agility to work remotely given the hybrid workplace model being adopted by the majority of organisations.

Grafana vs. Chronograf and InfluxDB

How can you judge Grafana vs. Chronograf and InfluxDB? Monitoring various systems is a crucial component of continuous maintenance. You can look at different parameters of the monitored system and take corresponding actions for certain conditions. For example, engineers can prevent server failure when they see the load on the server approaching its critical point. If the numbers of processed transactions (or registered users) exceed the expected level, you can celebrate your success.

Datadog acquires Cloudcraft

A well-designed cloud architecture is essential to ensure that the underlying infrastructure stays operational, within budget, and compliant over time. These days, organizations are rapidly spreading their infrastructure across a broad, complex mesh of interconnected resources and services. It can be difficult to make high-level decisions about the design and management of these systems. This is why many organizations are now turning to cloud infrastructure modeling tools.

RedHat OpenShift monitoring with Splunk's OpenTelemetry Operator

Do you have an instant view of all the full-stack automated operations in your OpenShift environment. Would you like to monitor your self-service provisioning as code, to better understand health and performance? Have you been struggling to resolve service issues and reduce the time taken for troubleshooting across all your Kubernetes deployment? We’ve got you covered!

All Together Now: FinOps, Kubernetes, and Platform Engineering

Teams and organizations are leveraging Kubernetes to build platforms supporting their digital transformational efforts. A Kubernetes-based platform provides cloud-native architecture benefits such as automation, elasticity, resilience, and abstraction of the underlying infrastructure.

3 ways IT can secure open source software

Critics of open source software have long argued that giving everyone access to a project’s source code creates security issues. But in active projects with highly engaged communities, the opposite is actually true: Open source software helps organizations become more secure. That said, any piece of technology can be exploited.

Jaeger Tracing: Pros, Cons, Alternatives and Best Practices

OpenTelemetry (OTel), is an open source, CNCF (Cloud Native Computing Foundation) project that provides tools, APIs and SDKs for observability data collection (i.e, logs, metrics and traces) from cloud-native applications. Developers can use the data collected from OTel to monitor and analyze application health and performance. To leverage the data and its insights, you can export the data to external solutions, like APMs, open source Jaeger and Zipkin, Helios, and others.

Cloud Costs From A Cloud Product Manager's Perspective

As a cloud product manager, much of our jobs center around creating KPIs (or OKRs, if you prefer) such as sales revenue, freemium conversions, or customer stickiness, and encouraging development teams to hit product performance goals. But what happens after those goals have been met? Do we create new KPIs and push forward, or do we take a step back and evaluate whether each indicator actually translates to higher profit for the business?

Global bank transforms incident alert management & communications

One of the top 10 largest financial services companies in the world 200,000+ employees worldwide. Serving tens of millions of customers. With operations in more than 60 countries, the Interlink Incident Alert Management app serves an audience of thousands of service owners and business stakeholders - across 20+ global markets.

Pixellot Chooses the MoovingON.ai Platform to Power its Cloud Operations

Pixellot’s automated sports production solutions revolutionize traditional video capture, production and distribution processes, enabling professional and amateur sports organizations to affordably cover and monetize their events. Pixellot’s patented technology streamlines the production workflow by deploying an unmanned multi-camera system in a single fixed rig (with additional angles as required), covering the entire field and delivering a stitched panoramic image.

Optimizing Your Kubernetes Load Testing with Speedscale

One of the major factors that come into play when deciding on a load testing tool is whether it can perform as you expect it to. There are many ways to measure how well a load testing tool performs, with the amount of requests per second undoubtedly being one of the main ways. Speedscale creates load tests from recorded traffic, so generating load is at the core of the tool.

Installing the HG Heroku Monitoring & Dashboards Add-on

HG or HostedGraphite provides a complete infrastructure and application monitoring platform from a suite of open-source monitoring tools. Depending on the setup, you can choose Hosted Graphite as your data source and view all required metrics on beautiful Grafana dashboards in real time. Hosted Graphite offers a wide range of tools, add-ons, and plugins that make it possible to measure, analyze, and visualize large amounts of data about your applications with ease.

Sunbird Named a Top 10 Most Innovative Data Center Company to Watch by CIO Insights

We are proud to share that Sunbird was recognized by CIO Insights as one of their 10 Most Innovative Data Center Companies to Watch. "We will present the leading data center companies in the world and their contribution to making the world more digital-friendly," said Richard Thomas, Editor, CIO Insights. "They are constantly innovating and disrupting the data center space with their ability to look at the future.

The Start of Sleuth: Filling a Gap in Software Delivery Performance

Sleuth's founders -- Dylan Etkin, Michael Knighten, and Don Brown -- talk about where the idea for Sleuth came from, how it's filling a need in measuring and improving on software delivery performance, and how it helps teams unlock their ability to experiment. Give Sleuth a try and see why it's a deploy-based Accelerate / DORA metrics tracker both managers and developers love.

Configuring notifications for your CI builds with Slack and Twilio

CircleCI notification orbs were built to deliver messages to the appropriate channels when a build is successful or when it fails. This helps everyone involved in a project stay up-to-date with the status of the latest build. In this tutorial, you will explore and implement notifications sent to a Slack channel and also sent via SMS. To accomplish this task, you will make use of the Slack and Twilio orbs from the CircleCI orb registry.

Kubernetes Monitoring: Metrics, Tools & Best Practices

Monitoring any type of resource can be challenging. But Kubernetes monitoring is a special kind of challenge. Not only are there a variety of different Kubernetes layers and resource types to monitor, but collecting monitoring data from Kubernetes can be difficult if you use a managed Kubernetes service that limits your access to the underlying infrastructure. For all of these reasons, Kubernetes monitoring requires a different approach.

Season 2 finale: How to grow from failure in 2023 + Rob's worst failure

In the finale episode of season 2, our podcast producer Julia McClellan turns the tables on Rob to see what he's really learned from interviewing 18 top tech leaders in 2022. Rob reflects on how to grow, communicate, and change our ways from experiencing failure. Tune in today to hear Rob share his most catastrophic failure and catch a special preview to season 3.

Robotic Process Automation better monitored with Serverless360 BAM

The critical use case for BAM is to provide a simplified business-friendly view of the integrated processes that are key to your business. In business today, Robotic Process Automation (RPA) is a trending technology. While it has been around for quite a long time, it has gained a significant boost in popularity alongside the popularity of citizen developer and maker use cases.

This year's major trends in cloud migration

It is no secret that companies are shifting large infrastructures to the cloud. With more and more companies undergoing the digital transformation of their services, we have been witnessing cloud adoption as a software growth and maintenance strategy for a few years now. The nature of this movement has changed over time, so it is important to ask yourself what cloud adoption looks like today.

The Platform.sh CLI is ready to Go(lang)

The developer experience just got so much better with the latest Platform.sh CLI release. Designed and engineered to help developers manage their daily work environments more efficiently, this incredible tool is ready to Go for our entire developer community, becoming language independent with no need to install PHP, and embracing the distribution standards. With the Platform.sh CLI, developers can easily use and manage their projects directly from their terminal.

Infrastructure as Code: A Peek into Power of Terraform

Today, large enterprises are using multiple cloud providers and technologies. Enterprises dealing with distributed loads on their on-prem infrastructure, private clouds, and public clouds need teams that are versed in writing different CLI commands and scripts in multiple languages and require specialists in these fields. Unfortunately, all of this bears a hidden cost since the company has to train its employees in multiple technology stacks to manage different infrastructures.

You Build It, You Run It?

We’re all used to spicy social media debates producing more heat than light. But occasionally, the script is flipped and something useful is illuminated. Such is the case with a recent debate about the state of DevOps. It was started in the comments on a post by Leon Wright titled, “No one should write Terraform.” That spawned threads on Twitter (Sid Palas), with more conversation on Reddit, here and here. The bottom line?

Cloudify | Introduction to Backstage

Many web-scale companies such as Spotify, AirBand, and Twillio are facing the speed paradox:“The faster you grow, the more fragmented and complex your software ecosystem becomes. And then everything slows down again.” To deal with that challenge, they chose to develop their own internal development platform aimed specifically at increasing their development productivity by taking a more opinionated approach in which developers get to use infrastructure resources. Spotify backstage is an open-source project led by Spotify that provides a set of tools and a framework.

Hybrid work - Part tech and part culture

Prior to the pandemic, the notion of hybrid work was foreign and in most cases was for the “exceptional situation”. There used to be a small group of remote workers who weren’t near the office, and the road-warriors who were rarely in the office. On average though, working in the office was the norm. Beginning in 2020, there were headlines, articles, research, and statistics on how hybrid work is here to stay.

What is SD-WAN technology, and how does it work?

In recent years, Software-Defined WAN Technology (SD-WAN) has changed the way networking professionals secure, manage, and optimize connectivity. As organizations continue to implement cloud applications, conventional backhaul traffic processes are now inefficient and can cause security concerns. SD-WAN is a virtual architecture that enables organizations to use different combinations of transport services that can connect users to applications.

What is Supply Chain Choreography, and Why Should You Care?

The path to production has long been a space of custom pipelines, continuous integration (CI) sprawl, manual intervention, and tribal knowledge. Surely, there must be a better way? Something loosely coupled, more flexible, less error-prone, and doesn’t need deep integration with the tooling it controls. These goals motivated us to create Cartographer, our open source supply chain choreographer.

Is a multicloud strategy right for your organization?

For the last few years, many development teams have replaced traditional data centers with cloud-hosted infrastructure. Cloud adoption continues to grow, and teams are updating applications to leverage cloud-based services. But for many organizations, adopting a single cloud provider to host all their applications and data can put their business at risk. To reduce these risks, some organizations are distributing resources across multiple cloud providers in a specifically designed way.

Integrations on Rails: How we build and deploy integrations at FireHydrant

Implementing integrations without a mountain of technical debt can be challenging. But it doesn’t have to be all bugs, burn out, and outages when shipping integrations at a high volume. We’ve unlocked a pattern at FireHydrant to rapidly build and release integrations without swiping the technical debt credit card each time — and that gave us a fastlane to building premier integrations.

Day in the life of an SRE

We spoke with two members from the SRE team, Alex Blyth and Zulhilmi Zainudin, to learn more about their role at Civo. Through this series, we aim to provide you with an overview of the different roles we have at Civo and what advice our team has. You can discover more about our team in our “day in the life of a Go Dev” and “day in the life of an Intern” blog.

Introducing Cycle's Infrastructure Abstraction Layer (IAL)

Before I dive into the launch of Cycle’s latest feature (and it’s a big one!) I want to share some context about how we got here. Let’s rewind back to 2015: containers, at least in their modern form, had just begun to take the developer ecosystem by storm. At the same time, we at Cycle were watching everything unfold: from Docker’s meteoric rise to the first few releases of tools like Kubernetes, Rancher, and so on.

CircleCI + Squadcast Integration: Alert Routing Made Easy

CircleCI is a continuous integration and continuous delivery (CI/CD) platform that helps in implementing DevOps practices. It is used to build, test, and deploy projects, by automating pipelines with jobs. If you use CircleCI for implementing your DevOps practices, you can now integrate it with Squadcast to route detailed alerts to the right users in Squadcast. The below steps will help you set up CircleCI and Squadcast integration.

Demystifying Availability KPIs - and What Most Companies Miss

Most engineering teams are no strangers to key performance indicators (KPIs), those metrics tracking progress toward critical goals and targets. Ideally, tech leaders design KPIs to focus teams on what matters and prove their contribution to the company’s overall performance. Of course, KPI data should also uncover critical information that guides informed decision-making. For engineering teams tasked with managing the customer experience, KPIs often track availability.

Where Financial Services businesses should focus their digital transformation efforts in 2023

Like every business sector, Financial Services has been on a rollercoaster ride over the past couple of years. The pandemic forced a change in the way businesses work, and the way products and services are delivered to customers. Deloitte summed it up beautifully in the introduction to its ‘Finance 2025 Revisited report’1: “COVID-19 has sped up business innovation and stress-tested the concept of 100% remote work.”

Amazon EC2 Pricing Explained: An EC2 Cost Guide For 2023

A good chunk of your Amazon Web Services (AWS) public cloud spending goes to the Amazon Elastic Compute Cloud (Amazon EC2) service. Because it is the default compute service on AWS, Amazon EC2 is key to building, running, and scaling your AWS-based applications. That also means Amazon EC2 pricing has a tremendous impact on your AWS budget. Understanding how the EC2 billing model works will help you control and optimize your AWS spending.

Docker vs Kubernetes

Docker is a PaaS product, developed by Docker.Inc to containerize applications. It does so by combining app source code with OS libraries and dependencies required to run that code in any environment. Kubernetes is a similar tool developed by Google, which scales up this containerized application after deployment. While one works in building the containers the other essentially helps in scaling it up, then why so much buzz around these two?

Apache Kafka service design for low latency and no data loss

Designing a production service environment around Apache Kafka that delivers low latency and zero-data loss at scale is non-trivial. Indeed, it’s the holy grail of messaging systems. In this blog post, I’ll outline some of the fundamental service design considerations that you’ll need to take into account in order to get your service architecture to measure up. Let’s start with the basics.

Grafana vs. Splunk

Are you trying to choose between Grafana and Splunk, but can't find enough information about their capabilities? In this blog, we highlight the details of why a user should select Grafana OR Splunk as part of their monitoring stack and what are the user benefits of each. Also, you can check out what it's like to make your own Grafana dashboard using our MetricFire free trial. Get onto the product in minutes and see if you prefer Grafana over Splunk.

Cloud Cost Takes Centerstage: How Airbnb, Netflix, And Twitter Plan To Optimize

Amid the first bear market in over a decade, the world’s largest companies are facing intense pressure to cut back. Layoffs have made headlines, but cutting workforce is not a silver-bullet solution to surviving in a down market. True, companies tend to spend the most on personnel, but using layoffs as a first line of recession defense has myriad negative consequences for survivors, including reduced job satisfaction, reduced organizational commitment, and declining job performance.

Canonical announces new enterprise-grade Ubuntu images designed for Intel IoT platforms

15 November 2022: Canonical announced today the availability of new enterprise-grade Ubuntu images designed for next-gen Intel IoT platforms. Purpose-built for industrial environments and use cases, the latest Ubuntu images on Intel hardware deliver the performance, safety, and end-to-end security enterprises expect from the most widely used Operating System (OS) among professional developers with latest Intel technologies pre-enabled and available.

How integrating AWS into Cortex augments visibility into your infrastructure

With AWS re:Invent right around the corner, infrastructure has been top of mind at Cortex. Earlier this year, we launched our revolutionary Resource Catalog, which integrates with AWS accounts to automatically ingest all infrastructure components, from s3 buckets to lambdas. Through this process, Cortex allows you to track everything in a single place, while augmenting the information that already exists in AWS. The Resource Catalog surfaces live information about your infrastructure assets.

Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

Mean time to resolution (MTTR) is a metric that transcends industry and technology. It’s a measure of how quickly, on average, support teams identify, act, and resolve IT issues and incidents. Because MTTR directly relates to service quality, maintaining a low MTTR is a critical goal for DevOps and SRE teams. These teams have a vested interest in resolving issues quickly because escalating incidents to higher levels of the support team increases response and resolution times.

Relational Database vs. Non-Relational Database

Relational database or non-relational database: which should you use for your projects? It’s a common question. When choosing the database type that’s right for your requirements, it’s important to understand the differences between the two. Both database types are practical in different situations and use cases and have commonalities.

Generate RUM-based metrics to track historical trends in customer experience

Datadog Real User Monitoring (RUM) provides end-to-end visibility into the user experience and performance of your browser and mobile applications. RUM allows you to capture and retain complete user sessions for 30 days. This means you can pinpoint bugs, prioritize issues, and determine fixes with data collected across an entire quarter.

How to implement a mature incident response strategy

In 2021, the Biden administration issued an executive order outlining that the government and private sector need to work together to combat cyberthreats and improve the nation’s collective cybersecurity stance. As cyberattacks become more common and more costly, the United States — like other nation-states — needs to do everything it can to prevent attacks and rapidly respond to them when they occur, which requires modernizing its approach to incident response.

VMware Tanzu Operations Manager 3.0 Now Generally Available

VMware Tanzu Operations Manager is a software appliance designed for platform operators to use BOSH, the infrastructure-as-code automation powerhouse, a much more pleasant and straightforward experience. BOSH can provision and deploy software over hundreds of virtual machines, and it also performs monitoring, failure recovery, and software updates with zero-to-minimal downtime.

Hybrid Kubernetes Environments with Submariner

Submariner enables direct networking between pods and services in different Kubernetes clusters, either on-premise or in the cloud. Why Submariner? As Kubernetes gains adoption, teams are finding they must deploy and manage multiple clusters to facilitate features like geo-redundancy, scale, and fault isolation for their applications. With Submariner, your applications and services can span multiple cloud providers, data centers, and regions.

Data Center Power Chains: AC vs. DC

In a data center, the power chain is the sequence of infrastructure equipment that distributes power from its source all the way to the IT devices. Most data centers use alternating current (AC) power, though telecommunications companies typically use direct current (DC) power. There are pros and cons to each, and they require different equipment.

Deploy Django apps to AWS Elastic Beanstalk

Your software development team has an enormous number of tools available to them. Some older tools are being used in new ways, which has inspired the creation of more new tools to choose from. For example, JavaScript has grown from a language used to add interactivity on websites to a full-stack language for both frontend and backend needs. JavaScript has paved the way for Express, Nest.js, and many others.

Cloud 66 Celebrates 10-year Anniversary

Cloud 66 turned ten this year! While this is a big deal for us, I understand that it is not important to you. After all, who cares if a company turns 10, right? So, why am I writing about it, and why do I think you might also be interested in this? Since we started Cloud 66 in 2012, we and the world around us have changed significantly. We are no longer a scrappy startup with only big dreams to keep us going. Today, hundreds of customers rely on us daily for critical parts of their business.

Remotely Manage Every"Thing" with IoT Device Management

IoT Device Management Guide The Internet of Things has been raising lots of discussions and debates (of course for the benefits it offers). According to IoT Analytics, the number of connected IoT devices crossed 12 billion in 2020, which was 2 billion more than the estimated devices. Recent trends indicate that this technological transformation will not just be restricted to random “things” but will be ubiquitous, or what few are calling the Internet of Everything.

3 Best Ways to Transfer Photos from iPhone to Android Wirelessly without Quality Loss

Still have trouble for transferring photos from iPhone to Android? It's very easy if you use our 3 best ways to transfer photos & files from iPhone to Android wireless without quality loss. Chapters: AirDroid Personal also supports transfer files from Android to iPhone, even Windows & Mac. Cross-platform file transfering is never easy than ever.

How to Run Serverless Containers AWS EKS with using Fargate

This blog will discuss running serverless containers in AWS EKS with Fargate. Why and how we can use this configuration and provides a working example of how to use AWS EKS with Fargate. Recently, a customer reached out with an interesting request. They wanted us to run containers in serverless mode with AWS EKS. Their intention was to use Kubernetes features and run containers in serverless mode. Side note: in EKS you should manage NodeGroups and pay for it.

How can the financial services sector tackle cloud concentration risk?

The use of cloud computing by financial institutions has significantly increased in the last few years, a trend that was further accelerated by the COVID-19 pandemic. In the next few years, financial institutions will need to continuously balance the pressure to innovate quickly while managing risk and combating financial crime.

Are We There Yet? How To Know When You've Got Deep Enough Cloud Cost Metrics

In Part I of this two-part series, I talked about the key benefits of top-down cost allocation: It starts at the provider level, incorporates every penny of your cloud spend, and lets you break it down at as granular a level as is useful for your business.

Edge Computing Explained

Data is becoming increasingly essential to businesses globally, allowing for insights to be gathered around critical processes and operations. Over time, the traditional systems put in place to hold our data have become unsuitable for modern-day needs due to the continuous growth of data. Edge computing has emerged to reshape the current computing environment and allow data to be processed closer to where it’s being generated.

SAIC Shares Military-Grade Kubernetes Best Practices for Digital Transformation

Science Applications International Corporation (SAIC), a major system integrator and solution provider to government agencies, chose the D2iQ Kubernetes Platform (DKP) as the foundation for providing Kubernetes solutions for its customers.

Announcing New CircleCI + Honeycomb Integration Guide

If you’re writing software today, then you likely use a CI/CD pipeline to build and test your code before deploying it to production. Having a fast and efficient build pipeline saves you development time, shortens feedback loops, and helps you ship features faster. Conversely, slow and unreliable build pipelines are full of lost productivity and sadness.

3 Challenges of Kubernetes Monitoring (With Solutions)

Kubernetes monitoring is complicated. Knowing metrics on cluster health, identifying issues, and figuring out how to remediate problems are common obstacles organizations face, making it difficult to fully realize the benefits and value of their Kubernetes deployment. Understanding how to best approach monitoring Kubernetes health and performance requires first knowing why Kubernetes observability is uniquely challenging.

Replaying flows and troubleshooting issues in mobile app development using OpenTelemetry

iOS and Android apps are often a common component of distributed applications, forming a key part of the software architecture. These mobile apps provide another way to access data and perform actions on various services, requiring tight integration between the apps and the components which serve the data and control it.

MTTD: An In-Depth Overview About What It Is and How to Improve It

In this post, we'll learn all about the incident metric mean time to detect (MTTD). We'll see how to measure it and look at its relationship with other incident metrics like MTTR (mean time to recover). Both metrics give useful insights into your incident recovery ability.

Where is data center architecture headed to?

You know what data centers* are, we’ve told you a lot about the on this blog. Today, however, it is time to check out a particular aspect such as the singleness of their architecture**. In addition to what role they play in the present and which one they will play in the future. * Physical facility that organizations use to host their information, applications, critical data… **There’s a good example of alliteration, great rhetorical figure. So let’s go!

Tableau Review: Tableau vs MetricFire

Every day, businesses monitor system resources for performance, security, performance, and workflows. Otherwise, they jeopardize day-to-day operations when issues go unnoticed. Tableau presents itself as a data-driven monitoring tool that enhances data analysis of physical and virtual server environments. But just how good is it?

SolarWinds Review: SolarWinds vs. MetricFire

SolarWinds is a network and application monitoring solution, but primarily a network monitoring solution. Founded in 1999, the company has built an online community of 150,000 registered users. However, monitoring has come a long way since the early 2000s. How does SolarWinds stack up against MetricFire in terms of features and pricing? In this article, we break down the comparison into easily digestible, unbiased information to help you make an informed decision.

ELK Review: ELK vs. MetricFire

PU, memory use, latency, network bandwidth. These are just some of the monitoring metrics businesses analyze for security and performance. But successful data-driven organizations delve deeper than this. These companies probe millions of real-time metrics for unexpected insights and predict outcomes weeks, months, and years into the future. ELK helps them do this. It's a data analytics platform from open-source developer Elastic.

The Enterprise Roadmap to Reducing Costs While Increasing Value

Every enterprise is now facing the same challenge: to do more with less in a demanding economic environment. Releasing value to customers sooner while at the same time controlling the costs of their infrastructures, retaining skilled staff, and reducing risk. With technology now central to the way many enterprises offer, package and deliver their products and services, it has a big part to play here. Microsoft CEO Satya Nadella summed it up perfectly in the keynote to Microsoft Inspire 2022 when he said.

Stress test your Kubernetes application with Speedscale's offering in the Datadog Marketplace

Properly testing a service’s APIs to ensure that it can handle production traffic presents many challenges for engineers—SREs need to guarantee the resiliency of their application, while developers must ensure that their features perform well at any given scale. Speedscale is a testing framework built for Kubernetes applications that enables you to load test with real-world production scenarios by replaying actual API traffic that your application has experienced.

How to win the war for developer talent

Though organizations are projected to pump $4.4 trillion into IT spending this year, the supply of developer talent has struggled to keep pace, and many companies are having a hard time hiring workers. This is due to the fact that today’s top-performing developers understand their value and are increasingly asking for more money and better benefits when considering job offers.

Can Shift Left Go Too Far? Why Testing in DevOps May Never Be the Same

Testing is commonly understood to be an essential and fundamental part of software development, but when and how to test is open to a wider variety of opinions. DevOps practitioners often advocate for performance and quality testing early in the development and deployment process. This is known as a “shift left” approach.

Leveraging failure to achieve scale ft. Shailesh Kumar, SVP of Engineering at ClickUp

How do you not take failure personally, but instead use it as a tool for growth? Shailesh Kumar, SVP of Engineering at ClickUp, sits down with Rob to share his experiences being a part of numerous organizations on the brink of major growth. As head of engineering, Shailesh has had to manage his teams through difficult circumstances like platform stability challenges with a healthy dose of trial and error.

Grafana vs. Tableau

When it comes to visualization tools, there are various options, all designed for different kinds of data. Some of the most recognized among them include Grafana and Tableau. If you’re not sure which one to use, this article should give you a better idea of what kind of purpose each one has and which one will suit your needs best. One great way to find out what tool works best for you is to try it out! Try out Grafana in seconds on MetricFire's Hosted Grafana free trial.

7 Essential Factors When Choosing Platform Engineering Solution

The trend of Platform Engineering is now gaining momentum, which analysts and industry experts refer to as one of the most disruptive philosophies of the moment. But regardless of experts’ predictions and assumptions, what matters for organizations today is understanding what adopting an approach such as Platform Engineering actually entails, what a successful solution looks like, and how to adopt best practices for its implementation. That's what this article is about.

Effective vulnerability management for your microservices

Vulnerabilities are part and parcel of the software development life cycle. If left untreated, they can expose your application to malicious attacks, which can be detrimental to its functioning and reliability. To avoid severe damage and complications that arise from having the vulnerabilities exposed, it is good practice to set up a vulnerability management system. Vulnerability management is a practice that teams should integrate into the larger development process as it helps keep the software secure.

Grafana alerting

A lot of organizations are using Grafana to visualize information and get notified about events happening within their infrastructure or data. In this article, we will show how to create and configure Grafana Alert rules. To get started, log in to the MetricFire free trial, where you can send metrics and make Grafana dashboards right on our platform.

Expanded Datadog Lambda extension capabilities with the AWS Lambda Telemetry API

In 2021, we partnered with AWS to develop the Datadog Lambda extension which provides a simple, cost-effective way for teams to collect traces, logs, custom metrics, and enhanced metrics from Lambda functions and submit them to Datadog.

What is a Single-Line Diagram and What is It Used For?

A single-line diagram (also known as an SLD or one-line diagram) is a simplified representation of an electrical system. Symbols and lines are used to represent the nodes and connections in the system, and electrical characteristics may be included as well. In a data center, a single-line diagram is used to visualize the power distribution system to improve planning and troubleshooting, ensure redundancy, and reduce potential outages.

Enforcing Policy for Self-Service Environments with Cloudify and OPA

The Open Policy Agent is emerging as a standard framework for policy decisions in cloud-native environments. Running OPA on Kubernetes is a common method to provide Kubernetes admission control. OPA has become so popular that even Terraform recently announced beta integration of OPA support in Terraform Cloud.

What is Logging as a Service (LaaS)?

Logging as a Service, or LaaS, is a proven approach to managing and monitoring high-volume log data in modern dynamic environments. LaaS allows companies to manage log data regardless of whether it comes from applications, servers, or devices. With LaaS, companies can more easily aggregate and collate data, scale and manage storage requirements, set up notifications and alerts, and analyze data and trends. It also allows teams to customize dashboards, reports, and visualizations.

How reliability testing and load testing are complementary

How can you tell if your systems are reliable when under load? A common answer is to open your observability dashboards, wait for a high-traffic event (like Black Friday), and cross your fingers. While this approach is certainly effective, it's far from ideal. Without proactive reliability and load testing, we have no idea if a system will hold up to real-world usage patterns, which could mean a production outage at the worst possible time.

Auto-scaling of Intel FlexRAN components based on MicroK8s and Ubuntu real-time kernel support

RAN has incrementally evolved with every generation of mobile telecommunications, thus enabling faster data transfers between user devices and core networks. The amount of data has increased more than ever with an increase in the number of interlinked devices. With existing network architectures, challenges lie in handling increasing workloads with the ability to process, analyse and transfer data faster. The 5G ecosystem requires virtual implementations of RAN.

A practical guide to capturing production traffic with eBPF

Monitoring HTTP sessions offers a potentially powerful way to gain visibility into your web servers, but in practice, doing so can be complex and resource-intensive. Extended Berkeley Packet Filter (eBPF) technology allows you to overcome these challenges, giving you a simple and efficient way to process application-layer traffic for your troubleshooting needs.

How to Create K6 Load Tests from API Recordings

Load testing is one of the most common ways to test the resiliency of your applications. In this blog we show how recording production data with Speedscale and exporting to a K6 load tests gives you the best of both worlds. Whether or not it’s important for your organization, there are clear benefits to be had from implementing these types of tests. By doing so, you can: When it comes to load testing, two of the most modern tools are Speedscale and K6.

VMware Tanzu Service Mesh Advanced to Improve Multi-Cloud Operations for Developers and DevOps Teams

The VMware Tanzu Service Mesh team is showing previews of upcoming multi-cloud operations capabilities focused on improving productivity for developers and operation teams. Here's a sneak peek of the features that were showcased this week at VMware Explore 2022 Europe.

Sponsored Post

How to Test Autoscaling in Kubernetes

In an ideal world, you want to have precisely the capacity to manage the requests of your users, from peak periods to off-peak hours. If you need three servers to attend to all the requests at peak periods and just one server at off-peak hours, running three servers all the time is going to drive up expenses, and running just one server all the time is going to mean that during peak periods, your systems will be overwhelmed and some clients will be denied service.

Cluster Monitoring with Prometheus and Rancher

In this article, we present an overview of cluster monitoring using Rancher and Prometheus as well as provide some brief setup tutorials for both tools. We further introduce a metric visualization tool called Grafana that transforms your Prometheus time-series data into graphs and visualizations. MetricFire specializes in monitoring systems. You can use this product with minimal configuration to gain in-depth insight into your environment.

How to Monitor Redis Performance

In this article, we are going to look at how to monitor Redis performance using Prometheus. This will allow Redis Administrators to centrally manage all of their Redis clusters without setting up any additional infrastructure for monitoring. To follow the steps in this blog, sign up for the MetricFire free trial, where you can use Graphite and Grafana directly on our platform.

How to monitor NGINX web servers?

Web servers are among the most important components in modern IT infrastructures. They host the websites, web services, and web applications that we use on a daily basis. Social networking, media streaming, software as a service (SaaS), and other activities wouldn’t be possible without the use of web servers. And with the advent of cloud computing and the movement of more services online, web servers and their monitoring are only becoming more important.

How to monitor and troubleshoot Apache web servers

The Apache HTTP Server (Apache HTTPd) is one of the most popular open source web servers available. HTTPd was also the first project developed by the Apache Software foundation which now supports hundreds of well known projects including Kafka, Cassandra and Hadoop. Netdata has a public demo space where you can explore different monitoring use-cases. Check out the Apache demo room to explore and interact with the charts and metrics described here.

Blameless culture drives incident learning and other key insights from Catchpoint's 2022 SRE Report

SRE is a constantly evolving field, responding to the challenges of increasing reliance on tech and the opportunities of its evolving abilities. Reliability has to remain a step ahead of the cutting edge, whether it’s navigating remote work, implementing AI assistance, or optimizing internal processes. But how do we know that SRE is keeping up? ‍ We’re proud and excited to announce the results of the SRE Survey we ran in partnership with Catchpoint.

Trending topics at KubeCon + CNC NA 2022

Throughout KubeCon + CloudNativeCon NA 2022, our team was able to speak to over 100 people from the cloud-native community to learn more about their thoughts and experience of the event. This blog will explore what the community thought was the hot topic of discussion at KubeCon + CNC NA 2022, which includes topics such as security, cost, and developer experience. Check out the full video below.

Managing a Slew of Monitoring Tools? Here's How to Make Them Talk.

Engineering teams use a lot of single-domain monitoring tools. In fact, the average team manages and maintains 16 monitoring tools — and up to 40 — according to Moogsoft’s State of Availability Report. While IT leaders select and implement these tools to save teams time, our research finds they do quite the opposite. Engineers spend far and away more time on monitoring than they do on any other task — innovative, value-creating tasks included.

Azure pricing explained

So, you’ve decided to use Azure as your primary cloud platform and want to calculate your infrastructure costs. You estimate them based on listed prices, and rest assured that your startup/project will meet its budget. And then, suddenly, at the end of the month, you receive an invoice from Azure for an amount two times higher than you originally expected.

New feature announcement: Introducing Automated Backups

You asked and we delivered! The days of configuring cron jobs to take backups for each of your environments are over. Soon, for every grid project, you will have your backups automated according to the plan of your choice. Meaning you can focus on deploying today without worrying about whether you can roll back to yesterday - this new feature has you covered. We thought we’d give you the heads up so you can get in there early and choose your desired backups plan.

Integrate with AppDynamics | AppDynamics Demo with Moogsoft | Moogsoft Product Videos & How-Tos

After watching this video, you will be able to set up a template in AppDynamics to send data to Moogsoft, configure a JSON payload to map AppDynamics data to Moogsoft event fields, and define an AppDynamics policy to forward health rule violations and other issues to Moogsoft.

Cycle.io @ KubeCon 2022: Bringing a K8s alternative to the masses!

Detroit, known by its nickname “Motor City'', is a bustling and beautiful city filled with dazzling architecture, food, history, and of course, people.This year, it was home to KubeCon 2022. The city is close to home for Jake and I, 45 minutes from where we started Cycle; it was wonderful seeing how the city has grown the last few years. The art deco style buildings loomed overhead, and the smell of freshly cooked food wafted through the downtown area just outside the venue.

Mobile Cloud Computing: Overview, Challenges and Scope

The process of delivering mobile apps utilizing cloud technology is known as mobile cloud computing (MCC). Complex mobile apps today carry out activities including authentication, location-aware features and providing users with customized communication and content. As long as your device is online, mobile cloud computing enables you to store and access data anywhere. This makes it possible for data to be sent without difficulty anytime required.

Announcing Linux Shell Runners in Bitbucket Pipelines

We are happy to announce that Bitbucket Pipelines now supports non-containerized Linux Shell Self-Hosted Runners. We have moved from beta to an official release. You can now create a self-hosted runner and run it on your Linux infrastructure without container restrictions. Since it is your infrastructure, you will not be charged for the build minutes used by your self-hosted runner.

Configuring Fargate custom application metrics in CloudWatch using Prometheus

Over the past few months, Helios has experienced rapid growth resulting in our user base increasing, our services multiplying, and our system ingesting more data. Like all tech companies that need to scale, we wanted to avoid our performance becoming sluggish over time.

Keeping Track of Kubernetes Deprecated Resources

It’s a fact of life: as the Kubernetes API evolves, it’s periodically reorganized or upgraded. This means some Kubernetes resources can be deprecated and later removed. We deserve to keep track of those deprecations and removals easily. For that, we have just released the new deprecated-api-versions policy for Kubewarden, our efficient Kubernetes policy engine that runs policies compiled to Wasm.

Protecting Your VoIP Infrastructure From DDoS Attacks

Distributed denial of service (DDoS) attacks are an ongoing issue for communications service providers, putting critical systems at risk, undercutting service level agreements, and bringing unwanted headlines. In the first half of 2022 6 million of these attacks were reported. Some metrics of DDoS attacks in 1H2022 compared to 2H2021.

Early stage data teams: a balancing act

Most well established data teams have a clear remit and a well defined structured for what they work on and when: from the scope of their role (from engineer to analyst) to which part of the business they work with. At incident.io, we have a 2 person data team (soon to be 3) with both of us being Product Analysts.

A hands-on guide to work with MindSpore on Kubeflow

Looking at the report that Gartner did in 2022 regarding top technology trends, AI engineering represents an important pillar in the near future. It is composed of three core technologies: DataOps, MLOps and DevOps.The discipline’s main purpose is to develop AI models that can quickly and continuously provide business value. For instance, models that enable cross-functional collaboration, automation, data analysis, and machine learning.

Charmed Kubeflow now integrates with MindSpore

On 8 November 2022, at Open Source Experience Paris, Canonical announced that Charmed Kubeflow, Canonical’s enterprise-ready Kubeflow distribution, now integrates with MindSpore, a deep learning framework open-sourced by Huawei. Charmed Kubeflow is an end-to-end MLOps platform with optimised complex model training capabilities designed for use with Kubernetes.

Ask a Site Reliability Engineer (SRE)

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

Auditing Your Automation's Access: Using More Automation

Between CI/CD pipelines, container orchestrators, and developer debugging tools, more and more automation is needed to scale your systems. But how do you know if that automation is accessing the right systems at the right time? And how do you ensure that your automation is safe from exploits by unauthorized users?

The Top 10 Open-Source Products From KubeCon North America 2022

KubeCon is the major cloud-native gathering of thousands of people from around the globe. The event is attended by many emerging startups and companies working on revolutionary products around Kubernetes, security, containers, and DevOps. It is a great opportunity to share insights and collaborate on various community projects.

Mapping service vulnerabilities with Mend

Mend is an automated vulnerability scanning tool that helps teams detect and resolve issues quickly. Mend can discover outdated packages and tell you if you’re relying on tools with known issues. Then, through automated remediation, Mend creates pull requests for developers with specific guidance on resolving those issues. Mend conducts static code analysis as well as package and dependency management analysis to identify weaknesses.

ServiceNow for DevOps Engineering Automation

When I reference ServiceNow in discussions with DevOps and platform engineers they often look at me and quickly roll their eyes – with that bored look on their faces saying “next…” Why is that? At Cloudify, our DevOps automation platform primarily targets DevOps engineers, who were also locked in this mindset. Over the past year and a half, we started a deeper integration between Cloudify and ServiceNow, originally as a way to address a specific customer environment.

The Quest For Sunken Treasure: Top-Down Vs. Bottom-Up Cloud Cost Allocation

High-quality cloud cost allocation has become an existential issue for businesses. In order to get as much out of their (mounting) cloud investments as possible, business leaders need to know how much they’re spending in the cloud, what/who they’re spending it on, and whether there’s a good reason for it. In its ideal form, cost allocation answers all these questions.

Develop and Deploy a Python API with Kubernetes and Docker

Docker is one of the most popular containerization technologies. It is a simple-to-use, developer-friendly tool, and has advantages over other similar technologies that make using it smooth and easy. Since its first open-source release in March 2013, Docker has gained attention from developers and ops engineers. According to Docker Inc., Docker users have downloaded over 105 billion containers and 'dockerized' 5.8 million containers on Docker Hub. The project has over 32K stars on Github.

Develop and Deploy a Python API with Kubernetes and Docker - part II

In part I of this tutorial, we developed a Python API then we used Docker and Docker Compose to containerize the application and create a development environment. In part II, we are going to discover some other details about Docker and Docker Compose as well as how to deploy the same app to a GKE cluster.

Prometheus vs. Zabbix

For a successful business, you need to introduce an effective monitoring system covering all areas of your business and infrastructure - servers, databases, services, overall traffic, and even revenue collected. The users of this monitoring system can be system administrators, software engineers, information engineers, as well as all sorts of analysts.

Building an incident management process

In this podcast, our panellists discuss the foundations that any team needs to put in place when designing their incident management process. Starting from the basics of defining what we really mean by an incident, to how to set your severity levels, roles and statuses, Chris and Pete share their tips for building solid foundations to run your incidents.

The Power of Harnessing DevOps for the Database

Why do some organizations excel in streamlining their database operations and applications development while others find it immensely challenging? Why can some database teams embrace agility while others take months of cycles to deploy even a single line of code? What secret sauce can allow some database teams to work smarter (not harder), streamline database development lifecycles better, get to deployment faster, and create an overall stronger alignment across departments?

3 questions to ask in the build vs buy debate for incident response tooling

As a former incident responder and now as a responder advocate for FireHydrant, I’ve seen the “build vs. buy” debate play out many times. In fact, I even supported the tool that former employers used for managing incidents for years before they decided to buy (more on that in a future blog post).

What Is the State of IT Automation Going into 2023?

History will look back on this period of the 21st century as a pioneering, resilient, and excitingly disruptive time. We’re deep into a dynamic era as the cloud, Artificial Intelligence (AI), IT automation, and digital transformation converge to drive challenges and dazzling opportunities. The sheer force and potential of AI—coupled with unprecedented security risks and ongoing infrastructure advances will shape enterprises for years to come.

How to build your DevOps team with Agile culture

DevOps is the modern convergence of people, processes and tools to create a continuous software delivery stream. Much like the code it produces, the concept of DevOps itself is continuously adapting to encompass new ideas, methods, and technologies. So what can we expect from the DevOps cultural shift in the future? The delivery stream of yesterday was a segmented process, often separating key teams and concepts into silos where people focused on their individual tasks.

What's New with VMware Tanzu: A Sneak Peek of Announcements at VMware Explore 2022 Europe

Our first ever VMware Explore this past August was an action-packed event full of announcements, learning, and hands-on experiences. We had powerful conversations with customers and partners about how modern enterprises are making efforts to evolve out of “cloud chaos” and into a “cloud smart” world. We learned so much at the first VMware Explore event that we’re doing it again. The digital transformation journey continues, and our next stop is in Barcelona!

VMware Tanzu Expands 'Kubernetes Everywhere' to Sovereign Cloud

VMware continues to expand its VMware Tanzu Kubernetes portfolio to include new infrastructure types, providing customers with the right application components, tools, processes, and platforms to enable them to achieve a consistent developer experience, operations, and security. Today, the VMware team is pleased to announce that we are furthering that vision with availability of VMware Tanzu products on sovereign clouds.

VMware Tanzu Kubernetes Grid 2.1 Enhances the Multi-cloud Experience

It’s apparent that Kubernetes has become a mainstream technology in recent years. Although 99 percent of organizations are recognizing clear benefits from leveraging a containerized platform, according to the latest State of Kubernetes report, new technologies do not come without challenges and learning curves.

How Major League Baseball Scales Kubernetes Monitoring

Millions of global baseball fans tuned into the World Series last week, and we at Circonus were proud to help our customer, Major League Baseball, ensure they provided those fans with seamless viewing experiences. To celebrate our partnership, we’re rolling the replay on how MLB overcame Kubernetes observability challenges with Circonus as the league quickly scaled its Kubernetes deployment.

Qovery Demo Day Summary - November 2022

After a long summer break, the Qovery Demo day is back. 🌞 Our last Qovery Demo Day was live on Thursday, the 3rd of November. This event aims to give you insights into what we did during the past month and what’s next and showcase some of our new features. During this demo day, Romaric (CEO at Qovery) and Alessandro (Lead Product Manager at Qovery) joined me to talk about RBAC, Containers and Deploying Jobs and here is the recap.

Civo Update - November 2022

At the start of October, Dinesh Majrekar, CTO, and Mark Boost, CEO at Civo spoke at KubeCrash, about application deployments of old, cloud-native processes of today and edge native deployments of the future. Watch their session below to learn about the challenges we are about to face with an edge-first architecture and what we can do today to be ready. We then took to Detroit for KubeCon + CNC NA 2022 where we hosted an array of talks, workshops, and events.

The real cost of latency

In networking, latency - sometimes referred to as 'lag' - is the delay between a client request and the service provider's response. In a cloud environment this could be a developer or end-user client request, and the cloud service provider’s response. Or for multi-cloud, it could be one application in a cloud instance, talking to another application in another cloud instance. But no matter which type, latency can have a real impact on an organisation.

The Unit Economics Journey: Cost Considerations At Each Venture Stage

Just as there are a handful of stages to describe the financial journey from startup to established company — pre-seed and seed funding, followed by series A, B, and C funding — there are also stages to the process of achieving a healthy understanding of your costs.

15 Best Linux Networking Commands and Scripts You Should Know

Both servers and software development use Linux. Today, Linux distributions are used by the vast majority of electronics and embedded systems. Worldwide, Linux servers make up about 90% of all internet servers. Additionally, the Linux kernel is used by around 80% of all smartphones. Today, every system in the world is linked via a network. Information exchange across systems requires network connectivity. Computer networking refers to communication over the internet as well as within a network.

How to identify and map service dependencies

Modern applications are a web of interdependent services. As applications grow in size and complexity, and as more engineering teams adopt service-based architectures like microservices, this web becomes deeper and denser. Eventually, keeping track of the interdependencies between services becomes a complex and time-consuming task in and of itself. In addition, if any of these dependencies fails, it can have cascading impacts on the rest of your services and on the application as a whole.

Latest DZone Kubernetes in the Enterprise Survey Highlights Key Trends

To keep pace with the accelerating digital landscape, today’s organizations are adopting containers and Kubernetes to enable agility and increased time-to-value. Given Kubernetes widespread adoption, it’s no surprise there are so many new and emerging trends and best practices.

Launching increased transparency and control for workspace invitations

We are excited to announce some big changes for Bitbucket Cloud invitations. Over the next week we will begin progressively rolling out a new, more intuitive way to invite new members to your workspace with increased management controls and transparency. As part of Atlassian's cloud-first strategy, Bitbucket Cloud is investing in more and more enterprise capabilities to ensure a seamless experience for customers migrating from Server? Cloud.

3 Best Practices When Using Qovery

Qovery provides fast implementation and maintenance of your cloud infrastructure while taking care of end-to-end DevOps tasks. It even manages your Kubernetes clusters for you. It gives developers autonomy because it is effortless and does not need a vast DevOps workforce. With a few clicks, a developer can create a replica of the production environment and deploy their code easily, but where should you start, and with what?

Cloud Native Mastodon powered by Civo

In technology, nothing is static. We need to be open to experimenting with new platforms and avoid getting locked into any one single entity or technological solution. With all the recent events, many people are looking for alternatives where they can post microblogging content like they did on Twitter. Mastodon has recently become significantly more popular due to its decentralised nature and the power of enabling different communities to define themselves.

Securing the Usage of volumeMounts with Kubewarden

Securing a Kubernetes cluster is far from a simple task. How do you know if you have correctly configured volumeMounts in your in-cluster containers? And what about all those workload resources, such as Deployments, Jobs, Pods, etc? Luckily, you can use Kubewarden, an efficient Kubernetes policy engine that runs policies compiled to Wasm. This means you can run powerful specifically-written policies, our reuse existing Rego policies for example.

Introducing a more complete logs forwarding experience

One of the key attributes of DevOps and SRE engineers is their ability to meticulously observe and monitor all of their applications. A task which can be achieved more efficiently by centralizing all generated logs to a central endpoint. By centralizing logging, engineers can, at any time, have an accurate overview of all events which take place across their applications, from just one place. Storing logs in an external system also allows companies to ensure compliance with many certifications.

For incident management, should you build or buy?

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.

Building an automated unit testing pipeline for serverless applications

The Serverless framework is an open-source framework written in Node.js that simplifies the development and deployment of AWS Lambda functions. It frees you from worrying about how to package and deploy the application to the cloud, so you can focus on your application logic. Serverless applications are distributed by design, so good code coverage is vital, and should include unit testing.

What Can OpenTelemetry Distributed Tracing Architecture Do for Frontend Developers?

When developers talk about the options OpenTelemetry opens up to them, one of the most powerful use cases is troubleshooting distributed architectures. With OTel data and insights, developers can identify bugs and solve a wide range of issues across various types of architecture and flows. These include asynchronous flows, flows with Lambda functions, and many more.

Azure Migration Toolbox - Quick-start Guide to Moving Workloads to Azure

As more and more enterprises move to Microsoft Azure for their cloud services, there are many considerations they'll need to assess, and challenges to consider. This includes, for example, moving applications to a new region or seeking better coverage for highly available (HA) deployment. This easy to read guide will both give you tips on overcoming the challenges ahead, as well as a list of resources to help you get started.

5 Types of Git WorkFlow & Explanation of each Flow

As you might be aware, each team has its own unique workflow based on the project type, size of the company, team preferences, and a number of other factors. The larger the team, the more difficult it is to keep things under control: disputes become more regular, delivery deadlines may postpone, priorities always change - the list may go on and on. Adapting Git is the first step in resolving these challenges, as it can be used in almost any workflow.

Future NetWings relies on ManageEngine MSP platform to simplify their multi-client IT management.

Debasis Meta, the director of Future NetWings SecureNet shares his thoughts on the MSP market and how ManageEngine MSP ecosystem allows him to easily manage his multi-client IT, with distinct data segregation and comprehensive data security. He explains the criticality of customer experience from the point of an MSP and how ManageEngine helps him achieve excellence in delivering exceptional customer service.

Secure Cookies Using HAProxy Enterprise

My colleague Baptiste previously published an article on how to protect cookies while offloading SSL. I recently encountered a customer who wanted to achieve a very similar goal but using a more recent HAProxy Enterprise version. This post will explain the best practices for how to secure your cookies using HAProxy Enterprise.

Accelerate IT / OT convergence in Industry 4.0 [Part III]

Welcome to the concluding blog of this mini-series on tapping into the fourth industrial revolution. In Part I, we introduced and assessed the current status in the IT and OT domains. In Part II, we discussed the automation pyramid of modern factories and the need to adopt a more holistic approach toward closing the divide.

Kubernetes Best Practices For 2023 (To Implement ASAP)

Kubernetes (K8s) packs a ton of benefits as a container orchestration platform. For instance, K8s is big on automation. This includes automating workload discovery, self-healing, and scaling containerized applications. Yet, Kubernetes isn't always production-ready after a few tweaks. This guide shares crucial Kubernetes best practices you'll want to start using immediately to improve your K8s security, performance, and costs. Let’s get to it!

Are You Getting Everything You Can From Your ITIM and ITSM Integration?

IT Service Management (ITSM) tools are for many organizations the lifeblood of the help desk and possibly the entire IT department. Some would argue these tools are the lifeblood of the entire organization, as well. Many IT departments live and die by their IT Service Management tools, using them to track everything from support tickets to change control requests, provisioning and de-provisioning of resources and more.

Explore Azure costs for multiple subscriptions with cost analysis

In the fast-growing Azure space, it is essential to scale your business as Azure scales up. Azure is cost efficient by providing a pay-as-you-go model, but it is still necessary for enterprise users to undergo a Cost Analysis to keep their budget at stack. Let us consider a scenario, you’re a manager in your organization, and your team has been using Azure for the last several months. It has created multiple resources that cost money.

Download Azure Service Bus messages using Serverless360

From our experience handling Azure Service Bus messages, one frequent suggestion we get from the support person is to download messages from Azure Service Bus entities like Queues and Topic Subscriptions. By downloading the messages as a local copy, it becomes easy for them to debug the messages and use it at a later point in time. Basic knowledge of Azure Service Bus messaging entities is a pre-requisite for the better understanding of this feature.

Don't forget, it's the hardware that makes the cloud

Don’t forget, it’s the hardware that makes the cloud The main issues we see with clients and cloud implementations are that it can be very difficult for them to get a clear idea of what it is they are buying and how well it will perform. While the consumption and billing models are clear, it can still be hard to know how much you will pay each month. But what is hard is predicting exactly what the level of performance you will get. Some of this is inevitable.

The Top 10 Products From KubeCon North America 2022

The KubeCon event is a major cloud-native gathering thousands of people and hundreds of vendors for 3 days. Technology enthusiasts and adopters from leading cloud-native and open-source communities gather and discuss innovative ideas at KubeCon. It provides a forum where you can exchange relevant information and insights on the latest trends in Kubernetes and the container world.

How to mute alerts during maintenance windows or scheduled backups?

The health management APIs in Netdata allows teams to eliminate unnecessary alerting during scheduled maintenance, testing, auto scaling events, and instance reboots. For all SREs, it is absolutely crucial to filter out expected events during maintenance windows and quickly pinpoint critical issues in your infrastructure. Every minute is crucial while dealing with troubleshooting issues and any distractions that may hijack the troubleshooting process should be subdued.

Service Level Management Process Explained (with Examples)

‍ Service Level Management, or SLM, is defined as the process of negotiating Service Level Agreements and ensuring that they are met. ‍ Service Level Management is a fundamental part of SRE and DevOps. It encompasses the expectations and perceptions that both the business and the customer have about the service and its performance. Service level management will include existing and new services as they are added, with the service level agreements (SLAs) being modified accordingly.

Accelerating Digital Transformation: The Role of DevOps and Data

I was recently joined for a live webinar by Tony Maddonna (Microsoft Platform Lead, Enterprise Architect & Operations Manager at BMW) and Hamish Watson (DevOps Alchemist at Morph iT) to discuss their experiences with digital transformation and the impact it had on them, their teams and the wider organization.

The underappreciated power of technical project managers

Imagine you’re part of a software development team that’s working on an important new project. Everyone is excited about the work, but you’re running into trouble. The work wasn’t clearly divided up, so some of the engineers unintentionally did overlapping work. Meanwhile, neither the PM nor the engineers realized that they would eventually need sign-off from an external stakeholder, who doesn’t agree with all of the project requirements.

How release process documentation helps you ship software faster

Your release management process is one of the most critical processes in your organization’s toolkit. An excellent release management process can accelerate software release workflows, allowing your team to deliver software consistently and predictably and ensuring that your customers have an optimal brand experience. But release processes can have weaknesses that slow your team down and can even introduce risk to every release.

New GKE dashboards and metrics provide deeper visibility into your environment

Google Kubernetes Engine (GKE) is a managed Kubernetes service that enables users to deploy and orchestrate containerized applications on Google’s infrastructure. Datadog’s GKE integration, when paired with our Kubernetes integration, has always provided deep visibility into the health and performance of your clusters at the node, pod, container, and application levels.

To "SRE" Or Not To "SRE"? That is (No Longer) the Question

The DevOps world is going through rapid changes and rightfully so. In a world where everything is cloud or cloud-native-based, scale is becoming one of the most critical parameters for business efficiency. In fact, velocity is no longer measured by the number of code lines a developer produces, but rather by the time it takes the team to release a feature. Focusing on a feature release rather than the number of code lines forces businesses to switch to a more sophisticated delivery mechanism.

Platform Engineering: DevOps Evolution or a Fancy Re-name?

Everyone’s talking about Platform Engineering these days. Even Gartner recently featured it in its Hype Cycle for Software Engineering 2022. But what is Platform Engineering really about? Is it the next stage in the evolution of DevOps? Is it just a fancy rebrand for DevOps or SRE? As a veteran of the PaaS (Platform as a Service) discipline about a decade ago, and a DevOps enthusiast at present, I decided to delve into this topic, peel off the hype, and see what it’s about in practice.

Distributed tracing for Azure - Spot failures in the message flow

Serverless360 is a cloud management platform engineered for Microsoft Azure that brings enterprise-grade monitoring, tracing, remediation & governance under one roof. Everything you need to empower your Azure operations teams with more meaningful features and deliver effortless support.

Low-Code/No-Code: The Past & Future King of Application Development

Business organizations that want to save money and be competitive take into consideration the time costs associated with investments in new technologies. Will the efficiency gains translate to a rapid return on investment? Will users embrace the change and be more productive? Or will those investments be a hassle to employees and result in time-wasting workarounds and a fallback to inefficient, manual processes?

6 Examples Of FinOps KPIs That Will Improve Your Margins

Setting FinOps KPIs helps keep your whole organization aligned toward the same financial goals. However, it takes more than simply setting a broad, company-wide financial goal and turning every employee loose to work on that goal without more specific directions. It’s far better to come up with realistic and achievable goals tailored toward each person or team that will be responsible for them. That’s because KPIs should ideally be focused around the typical persona of each team.

State of Data on Kubernetes 2022 Survey Shows Big Payoffs for Kubernetes

Data is a modern company’s greatest asset, if used effectively. After all, in our always-connected economy, the most valuable business applications are data-driven. Customers expect real-time interactions powered by millions of end-points and massive amounts of data. To remain competitive in the market, organizations are adopting fast data applications to create new business models and transform industries, and Kubernetes is increasing the velocity with which they can be deployed.

Supply Chain Security Workshop

More and more attacks are aimed at the entire supply chain, which means that we developers are increasingly targeted by the attackers. Attacks like the SolarWinds hack show us that making sure you don’t use vulnerable dependencies isn’t enough. The attackers have their sights set on the entire development process with its components. In this workshop, we will look at the first steps and try them out in practice which will enable you to integrate the topic of security into your everyday life as a developer.

Go Beyond the Status Quo with Puppet Enterprise

From the largest physics laboratory in the world to a telescope network scattered across the planet to a design firm building roads and parks around the globe to an air navigation service provider that guides 1.2 million flights each year, 40,000 companies rely on Puppet to automate their infrastructure – with security and compliance baked right in. Puppet’s IT automation suite incorporates security, compliance, and innovation from day one to power Day 2 operations (and beyond), so your teams can spend less time managing IT infrastructure and more time changing the world.

Top 8 Open Source Dashboards

Before exploring open-source dashboard tools, we first need to learn about Dashboards and how they can be useful. A dashboard is a data visualization and management tool that visually tracks and analyzes the Key Performance Indicators (KPIs), business analytics metrics, infrastructure health and status, and data points for an organization, team, or process. It can be used to present operational and analytical business data with interactive data visualizations to your team.

Pandora's Flask: Monitoring a Python web app with Prometheus

We eat lots of our own dog food at MetricFire, monitoring our services with a dedicated cluster running the same software. This has worked out really well for us over the years: as our own customer, we quickly spot issues in our various ingestion, storage, and rendering services. It also drives the service status transparency our customers love. Our customers include large multinational coffee brewers, game companies, and other data science/SaaS companies.

Getting started with severity levels

An incident can take many forms. It can look like a small issue that locks a few customers out of their accounts or a huge catastrophe that brings down your entire product for a full day. How you respond to the incident should vary based on the impact of the incident. And that’s where severity comes into play. Defined severity levels are crucial to any good incident management program.

Bridging The Gap Between Applications & Cloud Environments | KubeCon Detroit 2022

What does “Bridging the gap between applications & cloud environments“ really mean? It means… Having access to a Marketplace with over 100 out o the box Certified Environments like clouds, tools and K8s Using a Composer that reduces design and configuration time Tapping in to self service catalog that delivers an easy ‘app-store’ experience for environments, apps, service-setup and management Get real time visibility, so you can visoally track all tasks execution

9 tips to master the art of software installation

IT admins face pressure from all sides. Besides tracking and securing data across devices, they must also manage a changing inventory of physical and digital assets while adhering to an executive directive to make technology an enabler for growth. Software deployment can be a particularly daunting task. Deploying software through an automated solution is just a click of a button, but what are the don’ts to keep in mind?

How to monitor Nginx

Are you interested in learning how to monitor Nginx? In this post, we'll show you all about how Nginx works and how you can use Hosted Graphite to monitor it. First, we'll read what Nginx monitoring is all about and how it can together work with Prometheus. Nginx, pronounced like “engine-ex”, is an open-source web server that, since its initial success as a web server, is now also used as a reverse proxy, HTTP cache, and load balancer.

Integrating Heroku Metrics with Amazon CloudWatch Metrics

Application monitoring plays a critical role in the success of your digital products. As you monitor various performance metrics such as usage of CPU, memory, network traffic, and more, you can swiftly take pre-emptive actions before things develop into a larger problem. In spite of the importance of monitoring, the task can become challenging when your infrastructure exists across multiple cloud platforms including AWS and Heroku.

Visualizing GraphQL Traces in Microservices

One of the things that most excites me about what we at Helios are doing differently than anyone else is trace visualizations. While there are many ways to troubleshoot microservice architectures, a good visual overview goes a really long way to speeding up understanding and therefore accelerating time to a resolution. When your manager asks, “Why did that break down?” with Helios you can answer quickly with accurate data—this is the value of the Helios platform.

Public cloud for telco - Part 3: Microsoft Azure

This is the third blog from a series focusing on how public clouds meet telecommunication operators’ business demands. In the previous two blogs, we talked about how Amazon Web Services (AWS) and Google Cloud Platform have enabled telcos to run critical workloads on public clouds. In this last part of our series, you’ll hear about Microsoft Azure cloud and why it’s a trusted platform for the telecommunication industry to host their workloads.

OpsRamp Patch 2.0 - Solving Your OS Patching Challenges

OpsRamp’s Operating System Patch Management module is a flexible, yet powerful capability provided to all OpsRamp platform customers or licensed separately. With our SaaS-based OS Patching solution for Windows and Linux endpoints, you can automate the entire patch management process from identification of missing OS patches to the process of patch installation.

Containers vs. Virtual Machines: Rivals or Friends?

Containers have been the buzz among developers in recent years with the adoption of cloud-native orchestration tools like Kubernetes and DevOps workflows centered around containers. At the same time, virtual machines (VMs) still power many enterprise workloads, whether they’re running in a public cloud provider like Azure or an on-premises data center running VMware. In one of my early jobs, we built a private cloud—in 2012. This was a ground-breaking project at the time.

How reporting enables informed decision-making

For software development teams to make meaningful progress, they must invest in efficient monitoring, reporting practices, and tooling. This is because only by keeping track of select metrics, such as those pertaining to application performance, will you know whether you are on the right track. Without knowledge of whether the software is functioning and performing as it is supposed to, there is no way of knowing what, if any, changes need to be made.

Interlink Software Achieves Cyber Essentials Certification

Cyber Essentials is a UK government backed scheme, developed by the National Cyber Security Centre. Since its inception the scheme has become the benchmark for IT security, helping organizations to deploy technical controls to guard against the common types of cyber-attacks and improve data security.

Current DevOps Problems & How Scout APM Solves Them

Most software companies rely on DevOps at some scale to aid their software development and deployment processes. DevOps has recently seen a major increase in popularity due to the advent of cloud-based tools and automation possibilities. DevOps can help you completely forget the woes of deploying software and focus better on building better apps and providing a holistic experience for your end user. However, just like other things in tech, DevOps is not perfect.

Predictive autoscaling - enhanced forecasting for cloud workloads

Elastigroup predictive autoscaling uses a machine learning algorithm to accurately predict the CPU utilization pattern of your workloads and increase the number of instances based on the projected CPU utilization. Predictive autoscaling also helps in cases where your instances/applications take a lot of time to boot up. Predictive autoscaling can scale the instances in advance of the actual traffic and thus saving up to 30 minutes of startup time.

IDC Report: A Best Practice Blueprint from Customers on Successful Cloud Migration

The events over the past two and a half years have radically changed the world. Digital transformation initiatives went from being a priority to an urgent imperative. Enterprises that had not already migrated data to the cloud rushed to do so during the pandemic, and others accelerated their on-premise-to-cloud shifts. Hybrid and multi-cloud environments become the new reality for many organizations who want to get the most out of their on-premises and cloud investments.

A more flexible and hassle-free approach to digital transformation

Businesses need a new, more flexible solution to make digital transformation far easier, says Daniel Blackwell, Product Manager – Networks & Security, Pulsant Digital transformation can provide businesses with a more flexible approach to infrastructure that simplifies the delivery of services and applications. However, this journey to digital transformation can introduce its own complexities that require a new way of thinking.

How to Assemble the Ultimate Network Management Toolset

Enterprise Management Associates (EMA) research determined that network infrastructure teams are challenged to monitor infrastructures that include hybrid public and private clouds. Their findings are that teams need to be more strategic in building their network management toolset. Based on this latest research from EMA, this white paper offers a guide to building the ultimate network management toolset.