Operations | Monitoring | ITSM | DevOps | Cloud

August 2022

Equip any user to monitor Kubernetes with the Overview Page

Many organizations use Kubernetes to orchestrate their containerized applications. But because Kubernetes is complex, application developers may take some time to ramp up on the intricacies of monitoring a Kubernetes environment. This means that teams often need to create internal documentation and offer hands-on training to bridge the knowledge gap.

Using incidents to level up your teams

I joined GoCardless as a junior engineer. It was one of my first coding jobs, and in my time there I progressed to senior much faster than I had expected. When I reflect on how this happened, one pattern stands out to me; the big step changes in my understanding, and my ability to solve larger and more complex engineering problems, came as a result of incidents.

Kubernetes operators - the top 5 things to watch for

Software operators are steadily revolutionising how we deploy and run complex distributed systems. They offer the promise of low-intervention, self-driving software – ideally leading to service reliability gains and better uptime. For an introduction to Kubernetes operators, check out our introductory webinar or download our guide to Kubernetes operators.

The 15 Best GCP Cost Optimization Tools For Google Cloud

Cost management in Google Cloud Platform (GCP) can be complex. It's a lot like costs in Azure or AWS. Often, companies don't know where to begin when they receive 20-page invoices GCP billing tools can be unfamiliar, especially to users coming from a different cloud provider. But the platform continues to simplify its cost monitoring and billing tools.

Emergence of cloud desktops

The world of consumption has changed to everything as a service. Be it the way we consume movies and music to our office productivity tools, personal data storage, health and wellness, even our day-to-day shopping experience for consumer goods. The evolution of technology played a huge role in this. People no longer download songs or movies, nor do we buy CDs and DVDs for our personal entertainment. Rather we are now in a world where OTT (Over the Top) has become the default.

Improve, enforce, and ensure microservice quality

Is microservice quality a focus for your team this year? Perhaps you’re looking to improve your service performance. Maybe you feel like your organization needs better ways to enforce best practices or ensure that service quality doesn’t degrade over time. Let’s take a look at these three areas and share some concrete recommendations for strengthening your services and creating a long-term culture of quality.

Surveys Show Kubernetes Crossing the Chasm

The adoption and maturation of Kubernetes has been rapid, driven by organizations moving to cloud-native infrastructures to modernize and gain agility. The Kubernetes growth trajectory reached a climax in 2021 as The Cloud Native Computing Foundation (CNCF) declared it “the year Kubernetes crossed the chasm.” The CNCF Annual Survey 2021 found that 96% of organizations were using or evaluating Kubernetes.

System Administrator vs. Network Administrator: Which One Are You?

There’s an abundance of career opportunities in IT. It’s hard for newcomers to find their footing in the middle of complicated terminology, with similar job roles but not equal. One of these difficulties is the system administrator vs. network administrator issue. Are they two completely different careers? Today, we’ll help dispel any confusion so you can make an informed choice. The Bureau of Labor Statistics projects a 5% growth for network and system administrators all through 2026.

FIPS certified vs compliant: what's safer?

Encryption is key to protecting sensitive data. There are several methodologies using different cryptographic algorithms to convert plain text into cipher text. Navigating multiple methodologies and algorithms creates a complex, labour-intensive process for teams evaluating the cryptographic services offered within software components.

Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Nowadays, organizations address a high volume of incidents everyday. With so much happening, responders can be overwhelmed by the volume of incidents and may end up de-prioritizing certain important incidents. Hence, it is important to have an efficient on-call scheduling and escalation process in place. In this blog, we will explore how Round Robin Escalations can help distribute on-call load and set up efficient on-call schedules. This blog covers the following pointers.

Bridging the gap between Engineering and Customer Support during incidents

Customer trust and satisfaction are the most important currency your business can own. No matter how brilliant your product, without happy customers your business will struggle. When everything is running smoothly, it’s easy to feel that heady dose of customer love. It’s when things break during an incident that these relationships are really put to the test.

What is reliability engineering?

Reliability engineering focuses on the ability of systems to perform as it is intended to and function without failure in a specified environment, for the required time duration. Reliability engineering can be applied across the entire lifecycle of software development. It is designed to increase the dependability of a product by detecting potential reliability issues early in the software development cycle, and correcting causes of failure that do occur.

Where can IT process automation make a real impact? | Resolve

Is your IT department buried in service desk requests or incident response tickets? Is your IT Operations team scrambling to assess and isolate a network application problem? In this video Resolve CEO Vijay Kurkal explains how automation can help clean up the mundane service desk tasks and communication while freeing up your ITOps resources to do more meaningful work by providing automation solutions for auto-remediation, provisioning, observability and much more.

Introduction to various Kubernetes dashboards by @Saiyam Pathak

In this session, Saiyam Pathak, Director of Technical Evangelism at Civo will discuss various Kubernetes dashboards that can be used to view the cluster and its related metrics and discuss the benefits of each. Saiyam will also look into the Kubernetes dashboard, Skooner, Headlamp, Lens, and Octant.

Ribbon Connect for Operator Connect

The Fastest Way to Enable Microsoft Operator Connect Operator Connect uses an API-centric integration model to enable service providers (operators) to deliver telecom services to Microsoft Teams customers. Ribbon Connect for Operator Connect takes the complexity out of that integration process, it accelerates time to market and eliminates the need for providers to initiate significant IT programs. Ribbon Connect also provides web portals and automated workflows that make it easier for businesses to get engaged with a provider and deploy telecom services.

Sneak Peek into the Next Major Release for VMware Tanzu Application Service

VMware Tanzu Application Service is a modern application platform for enterprises that want to continuously deliver and run microservices across clouds. Tanzu Application Service provides application development teams an automated path to production for custom code, and a secure, highly available runtime that scales to support the most demanding operations teams.

Preview Lifecycle Management of Amazon EKS Clusters through Tanzu Mission Control

Camille Crowell and Corey Dinkens contributed to this blog post. The VMware Tanzu team continues to work with our cloud provider partners to offer a streamlined, vendor-agnostic Kubernetes management platform. In our ongoing commitment to support customers in their multi-cloud application modernization efforts, the VMware Tanzu Mission Control team will be introducing a preview* of lifecycle management for Amazon Elastic Kubernetes Service (Amazon EKS) clusters in the coming months.

Introducing Unified Observability Platform by VMware Aria Operations for Applications

At VMware, we are on a mission to build a comprehensive, extensible, and intelligent monitoring and observability platform to help businesses run seamlessly. Over the past few years, we have evolved our platform to deliver invaluable end-to-end observability across applications and infrastructure.

What's New in VMware Tanzu: A Preview of Tanzu Announcements at VMware Explore 2022

Danielle Burrow and Munjal Munshi contributed to this blog post. Every modern enterprise is building applications to generate revenue, connect people and systems, and automate processes. Modern applications, architected to take advantage of the flexibility and efficiencies of the cloud, are projected by IDC to surpass traditional apps by 2024.

Argo CD Application Dependencies

If you are using Argo CD, you may be already familiar with how the Application CRD (Custom Resource Definition) object helps you logically group together your Kubernetes Manifests. The Application object is the atomic unit of work in Argo CD, and you should think of all your Kubernetes objects that are in an Application as a single entity. Applications are also autonomous. Meaning that, by design, one Application doesn’t know about the status or health of another Application.

High Scale Postman Load Testing for Kubernetes

In this Postman load testing tutorial, you’ll learn how to run a large scale load test in Kubernetes using your existing Postman collections. Because HTTP services don’t have a graphical user interface, it’s common to build collections of requests using Postman during the development process. These collections are useful for running quick functionality tests as you develop each endpoint.

Anomaly Detection and AIOps - Your On-Call Assistant for Intelligent Alerting and Root Cause Analysis

In this blog, we examine how anomaly detection helps by setting up healthy alerts and providing efficient root cause analysis. Anomaly detection, part of AIOps, guides your attention to the places and times where remarkable things occurred. It reduces information overload, thereby speeding up RCA investigation.

Are Code Freezes Still Needed?

A code freeze means no code can be altered or modified during the frozen time, and developers will not make any additional changes. Developers can only modify the code in the event of critical flaws and to the extent required to correct those vital problems. Primarily developers observe a code freeze during the final phase of software development when the software product has reached the delivery state.

Monitor your Microsoft Azure VMs featuring Ampere Altra Arm-based CPUs with Datadog

As organizations continue to expand their cloud footprint, managing costs without risking application performance is a priority. Because of this, Arm processors have become popular for their efficient, cost-effective processing power. Microsoft Azure’s new series of Azure Virtual Machines are powered by Ampere Altra Arm-based processors, which provide excellent price performance for scale-out and cloud-native workloads.

Simplify IT Management with Hyperconverged Infrastructure

HCI (Hyperconverged Infrastructure) is changing the way businesses work. For starters, the management and deployments become way simpler with HCI solutions. But that is only the start; proper resource utilization, demandable scalability, exceptional data protection, and so on, the benefits associated with HCI solutions are numerous. But what are HCI solutions? And how the HCI Hyperconverged Infrastructure simplifies IT management. Well, find out answers to these and many more questions related to HCI solutions in this no-bullshit guide.

mRemoteNG: Using the Remote Connections Manager for Windows

mRemoteNG is a simple tool popular among IT professionals for efficiently managing multiple connections. The popularity of password managers has shown how much people don’t like typing in passwords every time they want to connect to a service. And IT professionals are all too familiar with the annoyance of manually typing in credentials every time they want to connect to a remote service. mRemoteNG solves this problem for SysAdmins and IT teams, and in this article, we’ll explore how.

JFrog Providers Support the Terraform Community

If you’re reading this blog you’re probably at least somewhat familiar with Hashicorp Terraform and the value it brings to managing the deployment and provisioning of infrastructure resources at scale. We’re big fans and users of it ourselves here at JFrog (see how in our recent webinar!).

How to leverage Kubernetes to modernize your legacy enterprise application infrastructure

The biggest question facing many businesses today is how to improve the efficiency and agility of their application delivery processes. Along the journey to infrastructure modernization, do they continue to maintain applications as-is, or do they need to be migrated, upgraded or replaced? This blog attempts to answer some of these questions. Take any popular enterprise app, like your Oracle business suite, or even a custom-built application, and it will typically follow a three-tier architecture.

What to Look for in a Colocation Data Center

As the costs of managing and maintaining owner-operated data centers rise, enterprises are reconsidering their infrastructure and attempting to minimize on-premises data center space. Colocation data center providers are an attractive and cost-effective solution, offering physical space as well as power, cooling, network, and security services for their customers.

9 Cloud Cost Mistakes You Don't Know You're Making

When you’re operating in the cloud, making the right decisions is not always easy because there’s a lot of ground to cover, especially with regard to cost. The elastic nature of cloud infrastructure means your costs could quickly spiral out of control if you don’t have guardrails in place to help keep costs down. Properly managing your cloud costs is important, especially for SaaS companies. Cloud spend impacts your COGS, which in turn affects revenue and valuation.

Kubernetes Upgrades at LogicMonitor

Managing Kubernetes version upgrades can be a formidable undertaking. API versions are graduated, new features are added and existing behaviors are deprecated. Version upgrades may also impact applications needed by the Kubernetes platform; affecting services like pod networking or DNS resolution. At LogicMonitor, we ensure our Kubernetes version upgrades are successful through a process designed to instill collaboration and due diligence in a repeatable fashion.

Eastman Discusses Their Voice Network Modernization Journey and Decisions

Founded in 1920, Eastman is a global specialty materials company that produces a broad range of products found in items people use every day. The company's innovation-driven growth model takes advantage of world-class technology platforms, deep customer engagement, and differentiated application development to grow its leading positions in attractive end-markets such as transportation, building and construction, and consumables. As a globally inclusive and diverse company, Eastman employs approximately 14,000 people around the world and serves customers in more than 100 countries.

Deploying a Web App in any cloud using Terraform and Multy

In this tutorial, we'll deploy a simple web app to the cloud of your choice - composed of a database and a virtual machine where the frontend code will run. So that the configuration is reusable and consistent, we'll write it in Terraform. Usually Terraform configurations are cloud-specific, and changing clouds requires a complete rewrite. In this case, so that you can reuse the same configuration across clouds, we'll be using Multy.

Your CI GitFlow is Broken

One of the great things about GitFlow is that it makes parallel development very easy by isolating new development from finished work. New development, such as features, is done in feature branches and is only merged back into the main body of code when developers have validated the feature and the code is ready for release. For most development teams, feature validation happens in a staging branch coupled with a single testing environment.

Ephemeral Environments: Explained, Benefits, and How to Get Started?

Ephemeral environments are getting popular among companies that need to scale their business efficiently. Traditionally, the deployment environments usually included development, QA, staging, UAT, and production. However, the bottlenecks with shared QA and staging environments have hindered the efficient workflow of IT teams. The first issue you will face is environment drift because it is a shared environment, and you cannot afford a separate static environment for each developer.

Qoddi for open source projects

At Qoddi, we are strong advocates and sponsors of open-source projects, and with the recent decision from Heroku to stop offering free plans (used by a lot of open-source developers), we felt the importance of having an alternative ready to keep those projects running. Qoddi's infrastructure is compatible with most of Heroku's buildpacks and we wrote a guide to migrate a project from Heroku.

Monitor Akamai Datastream 2 with Datadog

Akamai is one of the world’s largest CDN solution providers, helping companies greatly accelerate the secure delivery of content to their users all across the globe. Akamai provides this content delivery through its Intelligent Edge Platform, which is made up of hundreds of thousands of edge servers distributed around the planet.

Continuous integration for Android projects

CircleCI is popular among Android developers for several reasons: it’s quick to get started, fast to execute your builds with high parallelism, (whether native, cross- or multi-platform), and even supports running Android emulators right from CircleCI with our Android machine images. This article will show you how to build and test Android applications for an example project on the CircleCI platform.

Qovery is a G2 High Performer for Summer 2022

We at Qovery are excited to announce that G2 has named Qovery as a Summer 2022 High Performer in the following categories: Continuous Deployment / Environment as a Service. G2 is the world’s largest tech marketplace. The High Performer Award recognizes companies with high customer satisfaction ratings relative to their market presence.

SRE vs. DevOps: Differences and Similarities

Organizations scramble to adopt new frameworks and methodologies to make the software more scalable. Plus, they need to do it in a reliable way that doesn’t cause more problems. Enter Site Reliability Engineering (SRE), a set of practices introduced by a Google engineer. But how does it stack up to frameworks like DevOps? DevOps and SRE both enhance the software development and product release cycle.

Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Healthchecks is a cron job monitoring service which listens to HTTP requests and email messages ("pings") from your cron jobs and scheduled tasks ("checks"). It lets you update your job to send an HTTP request to the ping URL every time the job runs. When your job does not ping Healthchecks.io on time, then you will receive an alert! If you use Healthchecks for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Healthchecks to the right users in Squadcast.

Level Up Your DevOps Strategy with Intelligent Alerting

In the world of DevOps, every second counts. Problems need to be fixed fast, but with the intention that it’s done with a legitimate purpose for when something’s wrong. Continuous monitoring helps with automation and setting up the right kinds of alerts. If the system is going haywire, every moment not acting can make things worse. That’s why intelligent alerting is critical for enabling observability and continuous effective monitoring.

15 Essential Container Orchestration Tools For 2022

Managing containerized applications or microservices can be difficult. It is even more demanding and error-prone if you do it manually. So, what’s the alternative? Container orchestration. Container orchestration is an automation technology that helps engineers coordinate when containers start and stop, schedule and execute tasks, manage failovers, and perform recovery processes. The technology helps automate these tasks throughout a container's lifecycle.

Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.

Introduction to Cloud Native Application Architecture

Today, it is crucial that an organization’s application’s scalability matches its growth tempo. If you want your client’s app to be robust and easy to scale, you have to make the right architectural decisions. Cloud native applications are proven more efficient than their traditional counterparts and much easier to scale due to containerization and running in the cloud.

Schneider Electric consolidates monitoring tools by 83% with LogicMonitor

Schneider Electric consolidated its monitoring tools by 83% after onboarding LogicMonitor's observability platform. Schneider Electric, one of the most sustainable companies on the planet, is always striving to make energy better. This is done with the help of unified observability.
Sponsored Post

What are Runbooks? And why are they needed?

Imagine being an Ops engineer in a team just struck by tragedy. Alarms start ringing, and incident response is in full force. It may sound like the situation is in control. WRONG! There's panic everywhere. The on-call team is scrambling for the heavenly door to redemption. But, the only thing that doesn't stop - Stakeholder Inquiries. This situation is bad. But it could be worse. Now imagine being a less-experienced Ops engineer in a relatively small on-call team struck by tragedy. If you don't have sufficient guidance, let alone moral support- you're toast.

Stop Using TCP Health Checks for Kubernetes Applications

As developers, one of the most important things we can consider when designing and building applications is the ability to know if our application is running in an ideal operating condition, or said another way: the ability to know whether or not your application is healthy. This is particularly important when deploying your application to Kubernetes. Kubernetes has the concept of container probes that, when used, can help ensure the health and availability of your application.

Open-source storage for beginners with Ceph

Modern organisations have become reliant on their IT capabilities, and at the heart of that infrastructure is a growing need to store data. Be it transactional databases, file shares, or burgeoning data lakes for business analytics. Traditionally, storage needs have been catered to by big iron hardware vendors, but over the last decade, more and more organisations have turned to open-source solutions such as Ceph running on commodity hardware.

Using StatusPage at squadcast | SRE Best practices | Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

What are Canary Deployments and Why are they Important?

Every modification to software comes with the potential for production problems. Application failures often have serious consequences which can result in a loss of revenue and a poor customer experience. Additionally, organizations constantly try to improve their services for a better customer experience. How can you minimize the chance of error and update your application with confidence?

Ansible Key Terms: Getting Started

If you’re a systems administrator, there’s a good chance you’ve heard of Ansible. But if you’re not familiar with the tool or just getting started with it, there are some key terms and concepts you need to know. Here we will give you an overview of Ansible, from its origins to the latest features. We’ll also cover some of the key terminology associated with Ansible so you can start using it effectively immediately.

incident.io + Indent - on-demand system access

At incident.io, we empower teams to run incidents quickly and effectively from start to finish. One of the ways we help is by taking the manual admin out of your incidents. More often than not, folks are spending too much time thinking about the process, when the time would be better spent focusing on fixing. Our automated workflows, nudges and prompts help to embed best practices and unlock time for more impactful work.

Mattermost Playbooks How-to: OKR Management

Creating, managing, and tracking high level goals can be incredibly burdensome and complex for organizations with numerous stakeholders and cross-functional collaboration. Team leads and executives manage multitudes of reporting tools and departments while contributors often have little visibility into the process of creating goals or the progress towards achieving those goals.

Heroku discontinued their free tier - what are the top 3 best alternatives?

Today, It's a sad day for thousands of developers that were able to host their applications for free on Heroku. Heroku (acquired by Salesforce) announced that they discontinue their free tier to focus on "mission critical" businesses. Heroku has undoubtedly changed how developers deploy their applications in the cloud - but now, what are the valid alternatives? Here is a list of the top 3 best Heroku alternatives.

Secure Your Software Supply Chain Using Observability Webinar

Fequent software supply chain attacks are becoming the new normal for developers and security professionals everywhere. Even though it’s still relatively new, observability has continued to gain momentum as a way to identify software supply chain issues before they become a major disruption. Having access to the right data at the right time is necessary to make decisions about priorities. We’ve assembled a panel of experts from software, security, and data to talk about observability and what it means to your software supply chain security

Deploying a dockerized .NET Core app to an Azure container instance

In this tutorial, you will learn how to build a custom ASP.NET Core container with Docker and host the container image on Azure Container Registry, a platform owned by Microsoft that allows you to build, store, and manage container images in a private registry. At the end of this tutorial, you will be able to apply the knowledge gained here to link your container image on the Microsoft Azure registry with a web app service and launch your application.

Performing Postmortems & Postmortem Templates at Squadcast | SRE Best practices | Squadcast

Postmortems are a way to summarize the resolution for an incident once it is resolved. It is also a way for you to create a knowledge-base of failures and fixes that can be shared across your team to help build a culture of shared learning and learning from failures.

Monitor your Edgecast CDN with Datadog

Edgecast is a global network platform that provides a content delivery network (CDN) and other solutions for edge computing, application security, and over-the-top video streaming. Using Edgecast’s JavaScript-based CDN, teams can improve web performance by caching static and dynamic content with low latency and minimal overhead.

Feeling zen, finding DORA, and the policy police

We’ve had a bumper month here at incident.io HQ. We’ve welcomed 3 new joiners, celebrated two 1 year incident.io anniversaries (congrats Lisa and Lawrence!), released a whole load of exciting new features and (for those of you wondering what’s been causing the recent heatwave) we’ve redesigned our website and it is on fire 🔥 😎 Here’s a round-up of some of this month's highlights…

Updating our data stack

It’s been over 6 months since Lawrence’s excellent blog post on our data stack here at incident.io, and we thought it was about time for an update. This post runs through the tweaks we’ve made to our setup over the past 2 months and challenges we’ve found as we’ve scaled from a company of 10 people to 30, now with a 2 person data team (soon to be 3 - we’re hiring)!

20+ SysAdmin Tools You Can't Live Without

Being a system administrator is a high-level and demanding profession. Yes, we’re talking long hours (not counting overtime!), unforeseen events requiring attention, and so much troubleshooting. But not everything about SysAdmin life has to be more challenging than it needs to be. That’s why we put together this list of must-have SysAdmin tools so you can optimize your workflow and focus on critical tasks.

Why APM distributed tracing is not enough for developers

Distributed tracing is a method of tracking requests as they propagate through a distributed system. A trace is built from spans. Each span represents an interaction, like an HTTP request, a DB query, a serverless function invocation, etc. A trace is essentially a tree of spans. Based on the collected span data, a distributed tracing platform can capture all the interactions between the different architectural components and tie them together with a trace ID.

How Zenoss Is Key to Efficient Operations in 4G and 5G

As mobile networks evolved to 4G, network elements transitioned from hardware-based, often with proprietary chassis, to virtualized network elements. Communication service providers often implemented such technologies through private cloud infrastructures based on different flavors of OpenStack. On the other hand, 5G introduced microservices-based deployments with a service-based architecture.

How to Explain Zero Trust to Your Tech Leadership: Gartner Report

Does it seem like everyone’s talking about Zero Trust? Maybe you know everything there is to know about Zero Trust, especially Zero Trust for container security. But if your Zero Trust initiatives are being met with brick walls or blank stares, maybe you need some help from Gartner®. And they’ve got just the thing to help you explain the value of Zero Trust to your leadership; It’s called Quick Answer: How to Explain Zero Trust to Technology Executives.

Terraform vs Pulumi

Terraform and Pulumi are both Infrastructure as Code (IaC) tools. They allow you to manage, provision, and configure your infrastructure using code, which makes it easy to automate your infrastructure deployments and manage them in a version control system. Terraform is an open source tool developed by Hashicorp. It’s popular among developers because it’s easy to use and has a wide range of community-developed plugins and integrations.

Site Reliability Engineering, Site Reliability Engineers and SRE Practices: State of Adoption

Site reliability engineering (SRE) is what you get when you treat operations as if it’s a software problem. The mission of an SRE practice is to protect, provide for and progress the software and systems offered and managed by an organization with an ever-watchful eye on their availability, latency, performance and capacity.1.

How to Measure Data Center Sustainability

Sustainability is one of the biggest, if not the biggest, topic in the data center industry today. A sustainable data center is a facility that can maintain operations at a high level of efficiency over time. It is important for data centers to be as sustainable as possible because they use a lot of resources which makes reducing their environmental impact and carbon footprint top priorities. It is also important because these facilities need to comply with corporate sustainability initiatives.

Native vs cross-platform mobile app development

In just a decade, smartphones have become ubiquitous. They facilitate communication via texting and calling, provide entertainment, enable administration, and offer utilities for their users in the form of applications. Users access these mobile applications through their app store, whether it is Apple’s App Store or the Google Play Store. Developers construct them with the smartphone’s operating system in mind. The two mainstream operating systems that are targeted are Android and iOS.

Find the root cause faster with Datadog and Zebrium

When troubleshooting an incident, DevOps teams often get bogged down searching for errors and unexpected events in an ever-increasing volume of logs. The painstaking nature of this work can result in teams struggling to resolve issues before new incidents appear, potentially leading to an incident backlog, longer MTTR, and a degraded end-user experience.

PagerDuty Service Standards helps organizations better configure services at scale

Service ownership, a DevOps best practice, is a method that many companies are pivoting towards. The benefits of service ownership are varied and include boons such as bringing development teams much closer to their customers, the business, and the value being delivered. The “build it, own it model” has tangible effects on customer experience, as developers are incentivized to innovate and drive customer-facing features that delight.

Easy Kubernetes metrics and cluster management with K9s, Kubectx, and Kubens

Efficiently retrieving Kubernetes metrics is the beginning of cluster performance optimization. Cluster metrics such as logs and resources descriptions are crucial as they map the health of your cluster. There’s a Kubernetes tool called K9s that lets you access your Kubernetes cluster metrics swiftly using short keys. K9s prioritizes aesthetics and performance as it displays its contents and functions in an aesthetically pleasing UI.

How To Put Cloud Nimble to Work to Segment Dev/Test from Production

In every workplace, most work gets done at the most cluttered desks. Yet the business also requires an orderly front office to run efficiently. It’s much the same with your DevOps pipeline environments, as the rough and tumble process of innovating code must ultimately produce cleanly released applications. Continuous integration means that developers perform many builds each day, but few of those builds will advance to production repositories.

Overcoming data chaos ft. Thomas Hazel, founder of ChaosSearch

In this episode, Rob is joined by Thomas Hazel, founder and CTO of ChaosSearch. Every software company has tons of data to manage. Have we set ourselves up for failure? How do we recover from a data mess? Learn how Thomas embraces chaos to tackle big data problems by taking risks and embracing failure.

Partner Talk: Chris Free, CEO of Chromatic

For the last 15 years, Chromatic CEO Chris Free and his team have built high-value websites and provided services for both clients and other agency partners alike. Beyond technology and design, Chromatic has built and nurtured a remote, global company culture that focuses on the individual, engendering staff loyalty and longevity. Get an unfiltered view from Chromatic about agency life, learnings, and the role Platform.sh plays in helping to deliver client value in this Platform.sh Partner Talk.

Top 5 Debugging Tips for Kubernetes DaemonSet

Kubernetes is the most popular container orchestration tool for cloud-based web development. According to Statista, more than 50% of organizations used Kubernetes in 2021. This may not surprise you, as the orchestration tool provides some fantastic features to attract developers. DaemonSet is one of the highlighted features of Kubernetes, and it helps developers to improve cluster performance and reliability.

Behind the Scenes with the Kubernetes 1.25 Release

Join us live on August 23 as SUSE's Robert Sirchia hosts Rey Lejano, Emeritus Adviser of the Kubernetes 1.25 release team, during the Kubernetes 1.25 release. Rey will guide you through the Kubernetes Release Process and Release Cycle showcase the highlights of this release. This is your chance to ask questions directly to the people involved and look behind the scenes of the release process.
Sponsored Post

Site Reliability Engineering: Definition, Principles & How It Differs From DevOps

Site crashes and outages can cost hundreds of thousands in lost revenue and inconvenience users. Site Reliability Engineering helps build highly reliable and scalable systems, particularly important for companies that depend on their software to support their customers performing critical operations. Hiring a Site Reliability Engineer is the best way to ensure a software system stays up and running at all times. Not only will they help manage infrastructure and applications, but they'll also be able to advise on how to scale a business as it grows - keeping downtime and incidents at a minimum!

The ABC of the Pathping Command

The pathping command is one of the most popular network troubleshooting tools in Windows. Initially released in 2000 along with Windows 2000, it has since been a part of every version. Despite its simplicity, it has become prevalent among network admins and is one of the first tools they turn to when things are not going right. This command combines the functionality of tracert and ping. In this article, we’ll be exploring what a pathping command is, what it does, and how to use it.

Understanding monitoring and observability

Roaming in the world of cloud technology not only helps you take a glance at the realm of cutting-edge technology but also helps you get familiar with concepts such as monitoring and observability. This article will cover an introduction to monitoring and the need for monitoring applications. From here, we will look at how you can utilize the data received when monitoring an application. This will allow us to understand how the concept of observability fits in with monitoring.

Zero Trust: The New Security Model for Cloud Native Applications and Infrastructure

Zero Trust security is gaining attention and momentum as a security approach or mindset that can improve the security posture of enterprises as they continue to battle hackers. Because of this widespread attention on Zero Trust, every software security vendor seems to be jumping on the Zero Trust bandwagon. However, Zero Trust is not a product or service. No single product or vendor can sell you Zero Trust security.

Path-based Routing with HAProxy

If you host dozens of web services that reside at various subdomains, TCP ports, and paths, then migrating them to live under a single address could simplify how clients access them and make your job of managing access easier. It would mean moving from a hodgepodge of address schemes, such as: to a single address wherein services are designated by the URL’s path: The good news is that you don’t need to rearrange your entire network to make this happen.

5 Things A Successful VPE And CTO Should Do Every Day

Great leaders know how to think big. As a VPE or CTO, your leadership role puts you in a position to make important changes and guide policy. But as a technical leader of your company, you’re almost always incredibly busy. It’s impossible to handle every single demand on any given day, which makes prioritization of tasks an important part of your daily decision-making. How do you know you’re making the right choices and working on the items that will make the largest impact?

Mean Time to Recovery (MTTR) explained

It's Friday afternoon, and you have mail. Apparently, a user received a 500 error when attempting to sign in. She contacted Customer Service. They didn't know what to do, so they forwarded the email to your engineering team. A close look at the email thread reveals that Customer Service received it... on Tuesday. And they sat on it until today. ‍ Hopefully, it was just this one user. You open your browser, navigate to the web application, and attempt to sign in. You also get a 500 error.

Considerations When You Mock APIs Inside of Kubernetes

Today it’s not unusual to see organizations having implemented mocking in their daily workflow, as mock APIs allow developers to speed up their development and not rely on external services. For those reasons and others, many engineers are looking to learn more about the mocked APIs and how they can best be implemented into their organization.

DEV Environment using GitHub Actions

In this video, you'll learn from Ahmad Musa, Software Engineer how to create a development environment using Github Actions. In Cloudify we provide a set of Github Actions out of the box that you can use to create some composite of these actions to achieve a goal. Environment managed as code, a Terraform-Atlantis like experience for your entire environment. Learn how to trigger a deployment (using comment) and update to the EaaS environment from previous steps.

Discover the business impact of digital customer experience from E-Commerce and DevOps leaders

As the world continues to adapt to the new normal, the shift to digital customer experience has accelerated. And with this change comes an urgency to bolster and secure networks, systems and digital capabilities. More and more customers are moving to digital-first experiences. With about two-thirds of the buyer’s journey taking place in the digital realm, it’s imperative that businesses invest in their digital capabilities.

My Feedback about Nixpacks - an alternative to Buildpacks

2 months ago, I discovered Nixpacks - an alternative to Buildpacks to build a final container image that simply works. I’ve tried it with multiple projects, and the promise is very good. I feel it is less black magic than Buildpacks and easily extensible. In this article, I will share my feedback on the pros and the cons of this emerging project. Let's go!

The 15 Best Azure Cost Management Tools In 2022

Toward the end of Q1, 2022, survey findings reported that Microsoft's Azure cloud computing services had, for the first time, eclipsed Amazon Web Services (AWS) in some enterprise categories. According to the respondents, more enterprises preferred Azure because it integrates well with the many Microsoft products they already use. A second reason was that Azure is suitable for running on-premises and at the edge. Also, some organizations use Microsoft Azure to avoid vendor lock-in to AWS.

New Pricing that will give you peace of mind

Since the start of Qovery in 2019, a lot has evolved, the team got bigger, the product grew, a lot of new features were developed, and it’s now time for the pricing to change as well, but don’t worry, I’m not about to tell you that we are increasing our prices because of inflation, we’re just changing our approach in terms of pricing model and let me explain to you how this switch is going to remove plenty of stress from your shoulder!

Charmed Kubeflow 1.6 Beta #datascience #kubeflow #machinelearning

Kubeflow 1.6 is almost here! 🎉🎉🎉 The open source MLOps platform of choice keeps evolving year over year, growing in popularity and available features. Get the latest news about the changes that it came with from two of the engineers who were part of the upstream release team. We will be talking about pipelines, Katib and the news about the scheduler.

A new channel per incident - helpful or harmful?

I caught the tail-end of a Twitter thread the other day which centred around the use of Slack channels for incidents, and whether creating a new channel for each new incident is helpful or harmful. It turns out this is a much more evocative subject than I thought, and since I have opinions I thought I’d share them!

Uptime + Squadcast Integration: Routing Alerts Made Easy

Uptime is a site monitoring solution used to reach various endpoints & notify users via push notifications when downtime is detected. It collects and stores downtime & response time data & which is then made available as reports to the users. If you use Uptime for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Uptime to the right users in Squadcast. The below steps will help you set up Uptime and Squadcast integration.

See the big picture with the Service Dependency Graph

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

geeks+gurus: Rise of SRE - Survey Insights

Site Reliability Engineering (SRE) continues to rise in adoption. Teams that leverage SRE “good” practices are benefitting, individuals are excited about their jobs and IT and the business are collaborating more efficiently. Sounds interesting? We hope so, as there are a few key insights which you should know. Join us to learn more about the exciting journey of SRE. We have partnered with DevOps Institute (DOI) to conduct their inaugural 2022 Global SRE Pulse Survey, and we are excited to share the pulse on SRE.

Top 5 IoT challenges and how to solve them

There are a number of challenges to surmount for enterprises in the IoT sector, including having a short time to market, airtight security, a versatile update mechanism for hardware and software and mastering device management. The more planning and practical steps that are taken to address key considerations, the faster an IoT project can get to market and make an impact on the world.

What is Chaos Engineering? A Guide on Its History, Key Principles, and Benefits

Many organizations invest in high availability and disaster recovery for their key applications. Too many of these organizations, however, forego the most important aspect of this process—testing the failover process regularly. Whether gripped by the fear of downtime or dreaded DNS problems, development teams are frequently hesitant to test out what they’ve built in the real world.

Kubernetes Multi-Cluster: Why and When To Use Them

Application containerization has disrupted the way software applications have been built and deployed. Over the years, Kubernetes has stood out as one of the best platforms for container orchestration. It has helped many companies achieve scalability, resilience, portability, and better resource utilization in their products. However, managing Kubernetes is still complex. The first question which comes to mind is whether we should use a single cluster or a multi-cluster for Kubernetes.

Epinio and Crossplane: the Perfect Kubernetes Fit

One of the greatest challenges that operators and developers face is infrastructure provisioning: it should be resilient, reliable, reproducible and even audited. This is where Infrastructure as Code (IaC) comes in. In the last few years, we have seen many tools that tried to solve this problem, sometimes offered by the cloud providers (AWS CloudFormation) or vendor-agnostic solutions like Terraform and Pulumi.

Introducing the CircleCI visual config editor

The CircleCI visual config editor (VCE) is now generally available as an open source project. Development teams can now create and modify CircleCI config files in a visual drag-and-drop, low-code environment. The VCE is a node-graph editor that you can use to modify CircleCI config elements and generate config files. It provides a frictionless way to build CI/CD pipelines and interact with CircleCI’s platform in an efficient, user-friendly visual interface.

The 2022 Managed Kubernetes Showdown: GKE vs AKS vs EKS

Kubernetes may provide an abundance of benefits, but those who are using it may be well aware that it often requires quite a bit (or even a lot!) of effort and skill to run the platform independently. So – rather than having to put up with it on their own, organizations are able to pay for a managed Kubernetes service instead. This is where Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS) come in.

5 Signs Your Kubernetes Deployment Is at Risk

In this webinar you will learn how to identify the warning signs that your Kubernetes deployment is headed down the wrong path. You will learn how to avoid failure or save a failing Kubernetes deployment. The following are the 5 signs your Kubernetes deployment is at risk: As a Kubernetes solution provider with more than a decade of experience in designing and deploying containerized environments, we know first-hand why organizations fail and succeed in their Kubernetes deployments.

How Helios integrates with Cypress to provide backend visibility into your UI testing

Testing web applications from the user interface (UI) is a must for every customer-facing product, from e-commerce portals to cyber security dashboards. Often, a broken or inefficient UI experience can make or break whether end users adopt a product quickly and trouble-free. This is why developers have embraced UI testing as a critical part of their development process.

Video: Cloud Native Traffic Replay

With the introduction of new application platforms like Kubernetes, oftentimes the DevOps tooling around it needs to evolve. Cloud Native technology is powerful but complex. This 5 minute demo video shows how Speedscale provides production simulation capabilities so you can check for resiliency, quality and scalability in your Kubernetes clusters. You can record data and traffic in production and replay sanitized traffic on the fly against a new cluster.

Charmed Kubeflow 1.6 Beta is out: try it today!

We are happy to announce that Charmed Kubeflow 1.6 is now available in Beta. Kubeflow has evolved into an end-to-end MLOps platform for optimised complex model training. We’re looking for data scientists, ML engineers and developers to take the Beta release for a drive and share their feedback! Read on to learn more.

Tanzu Mission Control Expands Kubernetes Data Protection with Cross-Cluster Restore

To avoid application downtime and data loss during Kubernetes clusters outage, platform and application operators need to utilize backups for recovery. These backups should contain both the application’s persistent data alongside its configurations, which can be restored to the same or different cluster to get back into production more quickly.

VS Code integration with Ocean for Apache Spark

Ocean for Apache Spark has featured support for integration with Jupyter notebooks for quite some time – for details, please see our documentation. However, many developers would like to have this interactive notebook within their familiar IDE, such as VS Code, so that they can benefit from other IDE built-in features including Git integration. This article describes how to use VS Code to run Jupyter notebooks, while the code executes on an Ocean for Apache Spark cluster.

AIOps: Hype vs. Reality

What is AIOps? How does an AIOps platform help your observability practice? AIOps platforms analyze telemetry and events, and identify meaningful patterns that provide insights to support proactive responses. AIOps platforms have five characteristics:1 The above is Gartner’s definition and is part of the Gartner® “Market Guide for AIOps Platforms.” The Gartner definition is also aligned with our view.

Your Cloud Provider Will Fail You Eventually

Cloud has become the de facto way to build infrastructure, meaning cloud providers end up in charge of a significant amount of the apps we use every day. From the likes of Netflix, Slack, Ring and Doordash running on AWS or PayPal, Twitter and HSBC on GCP, it's easy to see how impactful a failure of any type can be. Let's look at some of the issues that have happened recently that have led business to consider how dependent they are on a single provider.

Introducing OpUtils' IP Request tool

Are multiple IT operators accessing, utilizing, or managing your network address space? If so, then you might have noticed that one of the time-consuming network management tasks you are undertaking regularly, is allocating IP addresses to the IT operators. This is an inevitable task since, as your network scales with new physical components or technology implementations, your operators require new IPs to enable network connectivity.

Distributed tracing and correlation through Service Bus messaging

Over the last few years, Microsoft has built excellent tooling around different technologies. Today, everything is available and achievable through the Azure Portal, which helps manage complex solutions. However, this brings in challenges when it comes to managing the distributed resources. The struggle to keep track of messages or know the flow of messages through these distributed resources is growing day by day.

What is Ansible: The DevOps Tool to Automate IT Tasks

Redhat Ansible is one of the most popular automation platforms in the IT industry. Since its launch, it has earned its place among some of the best DevOps tools. Powered by an open source community of more than 3550 contributors, it occupies more than 50% of the market share in the configuration management segment. This article will explore Ansible and how IT professionals and engineers use it to power their workflows. Read on to learn more about one of the most powerful automation platforms.

ESG research: leveraging observability data for DevSecOps

There’s a call throughout the industry to shift security left in the software development lifecycle, expanding the DevOps methodologies that have been growing in adoption for more than a decade. DevSecOps is based on the idea that security is not an afterthought. Rather, it is a collaborative process that must be integrated from the start of the development process.

Microsoft and Canonical announce native .NET availability in Ubuntu 22.04 hosts and containers

Canonical is proud to welcome the.NET development platform, one of Microsoft’s earliest contributions to open source projects, as a native experience on Ubuntu hosts and container images, starting in Ubuntu 22.04 LTS. .NET developers will be able to start their Linux journey with Ubuntu, benefiting from timely security patches and new releases. .NET 6 users and developers can now install the.NET 6 packages on Ubuntu with a simple apt install dotnet6 command.

How DORA will impact incident management at financial entities

The Digital Finance Strategy is a European directive that aims to support and develop digital finance in Europe whilst maintaining financial stability and consumer protection. There are three main components to the package: In this blog post, we’ll attempt to summarise the 113-page DORA proposal, highlighting how it will apply to incident management at financial entities.

Zero To One: Github Actions On Cycle

Github’s, “Github Actions” CI/CD tooling is becoming an increasingly popular option for developers. I've personally worked with several Cycle users who are adopting Github Actions as part of their deployment solution and we've found that when pairing GH actions with Cycle it makes for a simple, yet powerful combination. Today we’ll stroll through a basic example of using Github Actions alongside the Cycle platform.

Automating testing for FeathersJS applications

This is one of a two-part series. You can also learn how to automate the deployment of FeathersJS apps to Heroku. In the software development lifecycle, testing offers benefits that reach far beyond the code itself. Testing assures all parties (developers, clients, project managers, etc) that, while the application may not be completely bug-free, it does what is expected, as expected.

Manage Service Catalog entries efficiently with the Service Definition JSON Schema

The Datadog Service Catalog helps you centralize knowledge about your organization’s services, giving you a single source of truth to improve collaboration, service governance, and incident response. Datadog automatically detects your APM-instrumented services and writes their metadata to a service definition before adding them to the catalog.

Essential tips for automating DevOps workflows

Implemented well, automation can be a powerful tool for accelerating and scaling DevOps processes to keep your team building and shipping code quickly. But knowing what and how to automate DevOps workflows can be challenging; every organization’s DevOps practices are unique, and there’s no one “right” way to approach automation. Let’s look at a few tips for approaching DevOps workflow automation to help your team move faster.

How To Handle Untaggable And Untagged Cloud Spend

Let’s imagine, for a moment, that we live in a perfect world. In that world, you could check your company’s cloud bills and financial reports and find cleanly organized categories of spending that help you instantly understand where your money is going and why. Your engineers would meticulously label every spend item with useful metadata tags so you can clearly see which costs have increased and which are most affecting your bottom line.

Announcing the Open Beta for Linux Shell Runners in Bitbucket Pipelines

We are happy to announce that Bitbucket Pipelines now supports non-containerized Linux Shell self-hosted runners. This is currently in beta. You can now create a self-hosted runner and run it on your Linux infrastructure without container restrictions. Since it is your infrastructure, you will not be charged for the build minutes used by your self-hosted runner.

Self-hosted versus cloud-based mobile app testing

Testing is a vital part of the mobile app development process. Your team can use testing to evaluate the quality, security, and reliability of mobile apps before releasing them to your users. Users who expect their applications to be highly performant and intuitive. There are two ways DevOps teams can perform testing for mobile apps: on-premise (also called self-hosted) or in the cloud. But which of these is the best option for your team?

Assessing your apps for migration using the Cloud Migration Assistants | Atlassian

In this demo, you’ll learn how to assess your Server or Data Center apps in preparation for a migration to Cloud, using the Jira and Confluence Cloud Migration Assistants. We recommend that migrating customers assess their apps early on in migration planning.

Resiliency As the Next Step in the DevOps Transformation

We’ve reached the point in the DevOps transformation where efficiency and automation are no longer the highest objectives. The next step is engineering past automation and towards fully autonomous, self-healing systems. If you aren’t conversing about building this type of resilience into your systems and applications, there’s never been a better time than now to start.

Strategies to Align AI Data Collection and Management with DevOps Practices

DevOps is characterized by the acceleration of processes to ensure continuous delivery without compromising high software quality. Balancing speed and quality is quite a challenging task, though. Data issues are among the most significant problems encountered by DevOps teams. These can be worse in the context of AI development, where massive amounts of data play a crucial role in machine learning.

SRE Signals: 3 Types of Metrics for Site Reliability Engineering

Site Reliability Engineering, or SRE, is a widely-used set of interdisciplinary practices that help increase the efficiency of software development. But, aside from that, its purpose is to create scalable, connected, reliable, communicated systems that keep providing better, more reliable results. SRE leads to more connected, efficient organizations that can build resilient, iterable, and scalable software. To do this, SRE engineers leverage their coding expertise.

Restrict API Access with Client Certificates (mTLS)

An application programming interface (API) provides access to the features of a business application, but with the visual elements stripped away. By using APIs, devices like tablets, self-service kiosks, point-of-sale terminals, and robotic sensors can connect up to apps running on servers in a datacenter or in the cloud. Because they give access to the heart of your business applications, it should come as no surprise that there are some APIs that the general public should not have access to.

What is Swap Space?

For a machine to run and store the loaded applications, every processor needs data capacity. Storage is an serious issue if you work in IT since you will have bundles of software packages to run a single application. When RAM is nearly exhausted, the Linux swap function can help you. Using swap space instead of RAM in Linux systems can slow down the system’s performance. At the same time, there are more benefits when swap space is enabled.

Automate AWS Lambda function deployments to AWS CDK

When you build a cloud-based application, you can choose to deploy the resources using the GUI (Graphical User Interface) or CLI (Command Line Interface) provided by the cloud provider. This approach can work well with just a handful of resources, but as the complexity of your application increases, it can become difficult to manage the infrastructure manually.

10 Ways You Can Improve Service Reliability

Software reliability can be defined as the probability of a failure-free operation of a computer system over a specified period, under a set of specific conditions. It is an important factor in determining software quality. Site reliability engineering (SRE) is a software approach to IT operations that helps organizations to improve the reliability of their systems.

Track your test coverage with Datadog RUM and Synthetic Monitoring

The modern standards of the web demand that user-facing applications be highly usable and satisfying. When deploying frontends, it’s important to implement a comprehensive testing strategy to ensure your customers are getting the best possible user experience. It can be difficult, however, to gauge the effectiveness of your test suite. For instance, all of your tests may be passing, but they might not cover a specific UI element that is crucial to a critical workflow.

Connecting to incident.io with Zapier

At incident.io, we believe that incidents are for everyone. As part of enabling that mission, we think it’s essential to ensure that all users can create, configure, and maintain business processes related to an incident. Today, we have two approaches to support different people, products, and organisational structures: We’re excited to announce that we’re taking this further and adding Zapier to our growing list of options to automate your processes (and focus on fixing)!

Quick Beginners Tutorial to PowerShell ISE

Since its introduction, Windows PowerShell ISE has become one of the must-have system administration tools for IT professionals. Its robust GUI has served as a close companion to Windows PowerShell, allowing users to test and debug scripts for automation and task management. We hope this article serves as a small intro to the PowerShell ISE. We’ll be covering what PowerShell ISE is and how you can make the best use of it. Dig in to learn the basics of PowerShell ISE.

Upgrade your desktop: Ubuntu 22.04.1 LTS is now available

Whether you’re a first-time Linux user, experienced developer, academic researcher or enterprise administrator, Ubuntu 22.04 LTS is the best way to upgrade your creativity, productivity and downtime. Check out our new video to learn more! The release of Ubuntu 22.04.1 LTS represents the consolidation of fixes and improvements identified during the initial launch of Ubuntu 22.04 LTS and is the first major milestone in our Long Term Support (LTS) commitment to our users.

Tools and Tutorials for Learning Powershell

Do you have repetitive tasks you’d like to automate or manage, but not the time or know-how to code a solution from scratch? Powershell might be just what you’re looking for. Designed to help automate activities, it is a cross-platform shell and scripting language developed by Microsoft. With Powershell, you can use the command line or scripting language to have point-and-click activities executed and managed. It’s fairly easy to use and yet extremely powerful. So, where do you begin?

EvaluScale Finds D2iQ a Container Management Platform Leader

D2iQ continues to win not only customers in the public and private sectors, but also recognition from industry analysts who are affirming the unique value offered by the D2iQ Kubernetes Platform (DKP). Most recently, the 2022 EvaluScale Insights report named D2iQ a Leader in Container Management Platforms for the IT enterprise in two major categories, Container-Forward and Fast-Track Win.

How to retrieve Azure Key Vault Secrets using Azure Functions (Part II)

In my previous post, I explained How to retrieve Azure Key Vault Secrets using Azure Functions, where you can understand Key Vault URL and the Secret name to retrieve a secret generically, you could use it inside all your Logic App Standard workflows, whether they use the same Key Vault resource or different ones.

Changelog 11th August 2022

We have rolled out our new "operation log viewer." This feature is not just a facelift of our logging interface but a whole new way to view and search logs for actions taken by you and your team on your applications. The new viewer supports historical deployment logs, allowing you to see logs from previous actions. In addition, you can see logs for background operations like renewing your SSL certificates.

IT Tool Rationalization and You

If you ask the experts, they will tell you that companies have too many IT tools for monitoring their environments. Monitoring tools for network, infrastructure, application, wireless, endpoint, cloud, etc. proliferate all organizations. According to research by Gartner, more than a third of organizations surveyed have more than 30 monitoring tools. More than half of organizations surveyed have at least 11 tools. Sounds like a good argument for IT tool rationalization. But is this really the case?

Why devops needs a better approach to cloud networking

A full-stack networking platform with machine learning, autonomous capabilities, and multicloud support allows devops engineers to focus on what matters most—building applications. The promise of digital transformation is enabling businesses to magnify competitive advantages, create new revenue streams, and improve customer experiences.

Migrating from VMware to an open-source private cloud in financial services

This is part one of a two part blog series on open source based private cloud for financial services. This blog describes the need for a cost-effective private cloud to execute a successful hybrid cloud strategy. It also shares a comparison between proprietary and open source based private cloud platforms.

Why You Should Choose Argo CD for GitOps

Many organizations that have already implemented a DevOps culture are looking to further accelerate their development process by adopting GitOps practices in their environments. There is a lot to take into consideration when planning out your GitOps strategy, and you can read more about it at this Codefresh learning center about adopting GitOps.

3 Must-Use Strategies To Make Better SaaS Pricing Decisions

You work hard to deliver a great product to your target market. Yet, when it’s time to price your worth, it’s challenging to set a fair pricing strategy, model, or amount. This anxiety is understandable. If you charge too much, you could lose potential and existing customers. If you are a start-up, this bad first impression can be detrimental to your growth. For larger businesses, some customers may feel you are losing touch and switch to newer or veteran competitors.

Comparing Hyperconverged Infrastructure Solutions: Harvester and OpenStack

The effectiveness of good resource management in a secure and agile way is a challenge today. There are several solutions like Openstack and Harvester, which handles your hardware infrastructure as on-premise cloud infrastructure. This allows the management of storage, compute, and networking resources to be more flexible than deploying applications on single hardware only. Both OpenStack and Harvester have their own use cases.

Cycle Podcast | EP 15 | Darren Shepherd | The State of Running Containers in the Wild

In this episode, Jake Warner chats with Darren Shepherd, co-founder of Rancher Labs, and more recently, Acorn.io. Together, Darren and Jake, discuss the current ecosystem around container orchestration and dive into some of the flaws that exist with how applications are packaged and deployed today. Darren has spent his career writing orchestration systems, first in the IaaS space and then Docker and Kubernetes. He is best known for co-founding Rancher Labs and creating such projects as Rancher, Longhorn, k3s, and many others.

How to Install VMware Tanzu Application Platform with Transport Layer Security and Azure AD

VMware Tanzu Application Platform is a modular, application-aware platform that provides a rich set of developer tooling and a pre-paved path to production, enabling developers to build and deploy software quickly and securely on any compliant public cloud or on-premises Kubernetes cluster.

Testing the mettle: All you need to know about evaluating data solutions for large-scale applications

Imagine your organization encounters a project where you have to switch storage vendors... What would you do? To begin with, you will need to evaluate and test the performance of the storage providers on your servers. At Civo, we faced a similar project, allowing us to test several storage providers on our bare metal servers. In this article, we will discuss what you should look for while migrating to a different provider and the ways you can test these providers.

How VMware Tanzu Application Platform Has Improved with Frequent Longevity Testing

VMware Tanzu Application Platform is a modular, application-aware platform that provides a rich set of developer tooling and a pre-paved path to production, enabling developers to build and deploy software quickly and securely on any compliant public cloud or on-premises Kubernetes cluster.

How Sleuth measures Change Lead Time

Change Lead Time can be considered the most insightful of the four DORA metrics. But how do you measure it most accurately? In this video, Don Brown shows you how Sleuth measures Change Lead Time for code changes and how Sleuth breaks down that time into multiple buckets for the most detailed insight on what's slowing your team down. Check out these videos on how Sleuth measures other DORA metrics.

How Sleuth measures Change Failure Rate

Before you can measure the DORA metric for Change Failure Rate, you need to define what failure means. In this video, Sleuth's CTO Don Brown explains how Sleuth defines and measures Change Failure Rate, and how it ties failure back to deployments. Check out these videos on how Sleuth measures other DORA metrics: Give Sleuth a try and see why it's a deploy-based Accelerate / DORA metrics tracker both managers and developers love.

How Sleuth measures Mean Time to Recovery (MTTR)

The DORA metric Mean Time to Recovery (MTTR) tracks how long on average your failure spans are. In this video, Sleuth CTO Don Brown explains how Sleuth calculates this measurement, which gives you insight on how quickly your team can respond to and recover from failure. Check out these videos on how Sleuth measures other DORA metrics: Give Sleuth a try and see why it's a deploy-based Accelerate / DORA metrics tracker both managers and developers love.

Why a big bang approach is the wrong cloud strategy

Despite all the hype from the big cloud providers the truth is that most organisations rely on hybrid infrastructures now and will do so for the foreseeable future. Typically, this includes on-premises infrastructure and at least two public cloud providers. This is not a step on a journey to being 100 per cent cloud, it is the strategic destination many have chosen.

Autoscale your Kubernetes workloads with any Datadog metric

Editor’s note: This post was updated on August 9, 2022, to include a demonstration of how to enable highly available support for HPA. It was also updated on November 12, 2020, to include a demonstration of how to autoscale Kubernetes workloads based on custom Datadog queries using the new DatadogMetric CRD.

Monitoring Rails applications with Datadog

Rails is a Ruby framework for developing web applications. It favors the Model-View-Controller (MVC) architecture and includes generators that create the files needed for each MVC component. Rails applications consist of a database, an application server for running application code, and a web server for processing requests. Rails provides multiple integrations for its supporting database (e.g., MySQL and PostgreSQL) and web server (e.g., Apache and NGINX).

Tales from the Toil: Taking the pulse of SRE

Site Reliability Engineering (SRE) is a growing practice essential for enterprises to ensure service delivery, reliability, and access for users. Many companies only choose to invest in SRE when they have a raging operational fire on their hands. As a result, SREs often start out as firefighters, desperately trying to keep the service online for one more day.

Pre- and post-deployment testing methodologies for CI/CD

Your team has worked hard on a software product for months, and it’s finally ready to release to your users! But then the worst-case scenario happens: a wide release soon indicates that the software is plagued with bugs and performance issues, resulting in poor reviews and widespread user dissatisfaction.

Laying the foundations for a healthier digital future in the NHS

At the end of 2021, we published a blog post about the Autumn budget in the UK, what it meant for IT teams in the NHS, and why data management should be prioritised. We looked specifically at four key areas for sharing, monitoring, protecting, and accessing data that we believe are crucial elements of the digital transformation journey. Digital transformation is part of the NHS Long Term Plan, a wide-ranging programme to upgrade technology and digitally-enabled care across the NHS.

DevOps 101: The role of automation in Database DevOps

This is the fifth part in the DevOps 101 series and it’s time to talk about automation. Before we get into it, I just want to recap what DevOps is. Microsoft’s Donovan Brown sums it up nicely in a single sentence: DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The important thing to remember here is the order in which he talks: people, process, and products. That’s the way DevOps works.

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

Interrupts in software teams: using unplanned work to your advantage

Interrupts are often seen as a problem that eats away at your team’s productivity, and gets in the way of shipping important things for your customers. It’s often consciously accrued from the tech debt we accept to ship features sooner. However when a team doesn’t have a good strategy for dealing with the consequences of those decisions, the pain is felt much more acutely and much sooner.

The regulation driving multi-cloud adoption

Cloud computing can bring many benefits to financial services companies such as increased speed and agility, easier innovation and scalability. It is no wonder then that cloud adoption is set to continue increasing with 54% of financial services companies expected to have more than half of their entire IT footprint in public clouds in the next five years. However, despite the benefits that this can bring for financial services, it also brings a new set of challenges for financial market stability.

Improving DevOps Performance with DORA Metrics

Everyone in the software industry is in a race to become more agile. We all want to improve the performance of our software development lifecycle (SLDC). But how do you actually do that? If you want to improve your performance, first determine what KPI you’d like to improve. DORA metrics offer a good set of KPIs to track and improve. It started as a research by the DevOps Research and Assessment (DORA) and Google Cloud (which later acquired DORA), to understand what makes high performing teams.

How to launch Confidential VMs on Azure

Canonical is happy to announce the general availability of Ubuntu Confidential VMs (CVMs) on Microsoft Azure. Ubuntu 20.04 is the first and only Linux distribution to support Confidential VMs on Azure! Ubuntu CVMs use the latest security extensions of the third generation of AMD CPUs, Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP). As such, they bring about a fundamental shift in the traditional threat model of public clouds. They are part of the Microsoft Azure DCasv5/ECasv5 series, and only take a few clicks to enable and use!

Tensu: An Open Source Text UI for Sensu Go

A Two Sigma engineer explains why we built Tensu, an open source TUI (text user interface)-based program for interacting with Sensu Go’s observability pipeline and backend API. In this article we will be putting a spotlight on Tensu, an open source terminal-based dashboard for interacting with and responding to events from the Sensu Go observability pipeline and backend API.

What is Infrastructure as Code?

Cloud services were born at the beginning of 2000 with companies such as Salesforce and Amazon paving the way. Simple Queuing Service (SQS) was the first service to be launched by Amazon Web Services in November 2004. It was offered as a distributed queuing service and it is still one of the most popular services in AWS. By 2006 more and more services were added to the offering list.

How Datadog's Technical Solutions team uses RUM, Session Replay, and Error Tracking to resolve customer issues

Organizations across a wide range of industries share a common goal: deploy stable applications that support their customers’ needs. Many of these organizations rely on the Datadog platform to get complete visibility into the health and performance of their applications, and we understand how important it is that our services are reliable. That’s why we leverage our own products to ensure that the platform works as expected.

Sponsored Post

Datadog & Speedscale: Improve Kubernetes App Performance

By combining traffic replay capabilities from Speedscale with observability from Datadog, SRE Teams can deploy with confidence. It makes sense to centralize your monitoring data into as few silos as possible. With this integration, Speedscale will push the results of various traffic replay conditions into Datadog so it can be combined with the other observability data. Being able to preview application performance by simulating production conditions allows better release decisions. Moreover, a baseline to compare production metrics can provide even earlier signals on degradation and scale problems. Speedscale joined the Datadog Marketplace so customers can shift-left the discovery of performance issues.

The relationships between ITOM and DevOps

As explained in the earlier “What’s ITOM?” blog, there are various definitions of IT Operations Management (ITOM). This variation has a knock-on for everything related to ITOM, including the relationship between ITOM and DevOps. This issue is further complicated when people and organizations have different views of what DevOps is and isn’t. Hence, we must start with a single definition for ITOM and DevOps.

Civo Update - August 2022

In July, we released two new enterprise offerings, CivoStack and Edge! CivoStack is an enterprise-ready hyperconverged infrastructure (HCI) solution built for security, performance, and scale. To find out more visit our product page! Edge is our purpose-built Kubernetes hosted infrastructure, designed for on-location edge computing. Created with security, compliance and low latency in mind, find out more here.

How to Become a Site Reliability Engineer: Job Description, Roles & Responsibilities

Site Reliability Engineering (SRE) is still going strong in the world of software development. As a bridge between developments and operations, it’s a necessary part of any organization that wants to work like a well-oiled machine. Simply put, SRE tries to fix a widespread problem in organizations: siloing. But not much is known about the job requirements of becoming a site reliability engineer.

How to Use the Ping Command for Network and Troubleshooting

The ping command is one of network admins' most commonly used tools. It has served and continues to serve network admins as one of the best network troubleshooting tools since it was released almost 39 years ago. In this article, we cover what the ping command does, how to use it, and more. Read on to learn the basics about this simple but powerful networking tool that IT teams can’t live without.

Electrical and electronic vehicle architecture trends: an exploration

Vehicles are becoming more complex everyday. Customers expect safe, autonomous, connected, electrified and shared vehicles and these features are achieved via software. Although there is a clear change in focus from hardware to software, the advent of software-defined vehicles will rely heavily on optimised Electrical / Electronic (E/E) vehicle architectures. To make way for this changing paradigm, big hardware changes need to take place.

A Tale of Two Pluggables: Ribbon's Choice for Optimizing Metro Optical Networks

In a Light Reading article earlier this year, Scott Wilkinson, Lead Optical Component Analyst at Cignal AI, said, "The transition to 400GbE is well underway, and pluggable coherent 400Gbps technology is revolutionizing the design of the optical networks that connect data centers.

Cloud Modernization Best Practices

Cloud services have revolutionized the technical industry, and services and tools of all kinds have been created to help organizations migrate to the cloud and become more scalable in the process. This migration is often referred to as cloud modernization. To successfully implement cloud modernization, you must adapt your existing processes for future feature releases.

Why Your Legacy Cloud Cost Tools Aren't Cutting It

In the beginning of cloud computing, before the earliest cost tools came along to give companies a glimpse into their spending, most businesses found it hard to determine where their cloud budget was going. The money disappeared into the black hole of the cloud service provider, and in exchange, the business received cloud services. Achieving any sort of granularity beyond that was next to impossible.

Open DevOps toolchain page: installing, connecting, and configuring integrations to Jira Software

Dave McCormick, Engineering Manager - Open DevOps, walks through a new Open DevOps capability: toolchain page. Toolchain page is an admin experience for discovering, connecting, and visualizing your toolchain, all within Jira Software. Toolchain page is available to all Jira Software cloud users.

Application Platform with Crossplane and Shipa

Crossplane is an open-source project that lets you turn a Kubernetes cluster into a control plane. Crossplane lets you interact with your cloud provider API from a Kubernetes cluster, enabling you to create cloud resources required by your applications, such as databases or other resources supported by Crossplane for different cloud providers.

We've made it even easier to manage your FireHydrant configuration with Terraform

Many of our customers use FireHydrant’s verified Terraform provider to track configuration changes, ensure consistency, and automate repetitive configuration tasks. Back in March we streamlined our Terraform provider support for service catalog configuration. Today we are releasing extensive Terraform provider improvements for configuring runbooks, task lists, service dependencies, incident roles, and more.

Why it makes sense to invest in a hybrid cloud future

A cloud-first strategy is increasingly seen as the standard way to achieve efficient business operations. As the favoured approach of new start-ups and expanding businesses wanting to benefit from the flexibility and resilience of the cloud, it’s little wonder that foundational cloud services saw a revenue growth of 38.5% in 2021 according to IDC. It looks like a fantastic package from the outside.

Technical debt does not disappear

Cloud is currently seen by some of the clients I speak with, as the answer to associated technical debt. Technical debt is like any other debt – growing year-on-year, month-by-month. Simply relocating a workload from your dated compute stack to someone else’s system or service is not a panacea to every business-led postponement.

17 Best DevOps Tools to Use in 2022 for Infrastructure Automation and Monitoring

You must adopt proper infrastructure automation if you want to enable your teams to achieve faster application delivery while eliminating human errors. Automation of servers, deployment environments, configuration management, and deployments play a vital role in getting a competitive advantage for your product. Monitoring both the infrastructure and application is equally important as well. In this article, we will discuss top tools for infrastructure automation and monitoring.

Cloudability Pricing: How Much Does Cloudability Cost?

Apptio announced on May 31, 2019, that it had acquired Cloudability, hoping to strengthen its mission through its new product. Cloudability is a multi-cloud financial management platform for enterprises looking for a more unified way to view, understand, and act on cloud cost insight. Cloudability lets enterprises analyze cloud costs across Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Apptio Cloudability provides FinOps capabilities as Software-as-a-Service (SaaS).

A Hundred Organizations Trust Us: Why and How they Use Qovery

Since our start in 2019, we are proud to say that a hundred engineering teams from fast-growing organizations trust us to deploy their Production, Staging, and Development Environments on AWS in a few seconds. But why do they use Qovery, and how? That’s what you’re about to discover today!

Setting up a Multi-Architecture Kubernetes Cluster

In the last post we covered the industry shift towards ARM machines for both local and production software engineering. Last time we learned how to create Docker images that would work on multiple architectures for dev machines. Now we want to take this portability and leverage it for cost savings in production. You may be able to transition some of your services into multi-architecture builds.

What pressures are forcing rapid IT transformation? | Resolve

The business landscape is changing. IT resources are dealing with growing pressures from both inside and outside their organization, hindering them from keeping up with demand. In this video, Resolve CEO Vijay Kurkal describes in more detail the challenges that they are facing today and in the future.

Securing software supply chain without panicking ft. Chainguard co-founder Kim Lewandowski

Chainguard co-founder, Kim Lewandowski joins Rob to discuss the ways she presses forward in the fear-driven world of software supply chain security. In any kind of mistake or failure, security breaches have to be something that we can learn from. On the other hand, particularly during investigation, there are often walls of trust and other factors affecting fully transparent communication. Does this impact our ability to learn? Is there something we have to do differently to get better at it?

KPIs Drive Behavior: So Get Them Right!

Congratulations! You just put your 100th application into the cloud, marking the end of a highly successful project. As the senior leader tasked with driving transformation in your company, you are celebrating with the team. Your project manager buys the first round of drinks—they can afford it because their bonus was directly tied to hitting the target of 100 apps in the cloud and they are happy to spread the cheer.

Using BAM from Azure Synapse Pipelines

Many organizations are investing heavily in the data space, and Azure Synapse is one of the technologies in the Microsoft stack which is very popular. Within Synapse, Data Flows are the component of Synapse used to orchestrate the movement of data into and out of your Data Lake and for orchestrating jobs within your Data Platform. The business heavily depends on your data platform for movements of data within the organization and the analytics-driven from data via Synapse.

Datadog acquires Seekret

APIs are integral to the success of modern enterprises across a wide range of industries, such as finance, logistics, and manufacturing. They not only enable developers to build powerful business solutions by integrating with external applications, but also facilitate communication between internal services. This means that the ability to build reliable, highly-performant APIs—and govern their behavior and performance—is more important than ever.

The Importance of Automated Regression Testing with Social Integration in DevOps

Delivering business-critical applications and code relies on two key factors; functionality and efficiency. Mock and Unit tests are a few industry standards that aim to ensure the correct functionality of your code, catching potential bugs and issues before deployment. These tests are vital to workflows, CI/CD pipelines, and the overall build and deployment process. While functionality may be sound, one key aspect that is oft-forgotten is the efficiency and performance of your code.

Driving a customer-focused incident response process

Deep into an incident, Slack firing, up to your ears in decisions, not sure where to turn next? It’s easy for external communication with your customers to fall far down the list of priorities in these moments. However, these are the exact situations where comms are vital, and where underestimating their importance can having damaging and lasting effects on your organisation.

System Administrators: Everything You Need to Know to Become a SysAdmin

Having a reliable IT infrastructure is crucial to the health of your company or organization. And while this is generally true, it’s doubly so in our current times, where practically everything is digital. So, who is the lucky one to make general decisions about how to run a system efficiently — and keep it going? That would be the System Administrator. But what does a SysAdmin actually do? That's what we'll answer here!

Application Deployment to Civo with a Terraform Template

Join our CTO, Dinesh Majrekar, as he deploys the infrastructure needed to host any application on Civo. This session will use a new terraform template repo to deploy a cluster, install an ingress and let-encrypt helm chart all from terraform. Once these basics are in place, the cluster can be used in GitOps pipelines by other teams to actually deploy their code.

Why it's important to upgrade your Mattermost Server

Upgrading your Mattermost server involves a bit of research, preparation, and downtime. The pressure to keep your Mattermost instance healthy and reduce downtime for a core system within your organization can be intimidating. Recently, we worked with a handful of customers who were experiencing issues upgrading from Mattermost v5.37 and v5.39 to v6.x. Unfortunately, migration scripts were required to make significant database changes, and there was an issue in product performance.

VMware Image Builder Helps Verify Customized, Secure Software for Any Platform on Any Cloud

With the emergence of new programming languages, libraries, packaging systems, and dependencies, the open source landscape has become more diverse. At the same time, companies are finding it more and more complex to package and deliver open source software. This creates a massive challenge for independent software vendors (ISVs), large enterprises, and other organizations that need to control their software supply chain lifecycles while adhering to industry standards and best practices.

SRE: From Theory to Practice | What's difficult about tech debt?

In episode 3 of From Theory to Practice, Blameless’s Matt Davis and Kurt Andersen were joined by Liz Fong-Jones of Honeycomb.io and Jean Clermont of Flatiron to discuss two words dreaded by every engineer: technical debt. So what is technical debt? Even if you haven’t heard the term, I’m sure you’ve experienced it: parts of your system that are left unfixed or not quite up to par, but no one seems to have the time to work on. ‍

GigaOm Radar: Spot Leads the Way in FinOps Tools

FinOps, or cloud financial operations, is a method that helps organizations bring financial accountability to their cloud’s operational expenses. To see success, the cross-functional teams leading FinOps efforts — including finance, IT, and business professionals — need the right tools to manage and optimize their cloud costs.

Code signing: securing against supply chain vulnerabilities

When creating an application, developers often rely on many different tools, programs, and people. This collection of agents and actors involved in the software development lifecycle (SDLC) is called the software supply chain. The software supply chain refers to anything that touches or influences applications during development, production, and deployment — including developers, dependencies, network interfaces, and DevOps practices.

The importance of IT's role in digital transformation. | Resolve

It's no mystery how important automation is when it comes to augmenting the digital transformation initiatives for any company. However, if you do not understand IT's role in those efforts, you risk a fragile and siloed infrastructure that may be bring more problems than benefits as you try to scale your business. In this video, Resolve CEO Vijay Kurkal explains why IT is the foundation of your business.

AIOps is Dead! Long Live AIOps!

Artificial Intelligence (AI) is all the rage these days. Everywhere you look, companies are promising to solve your ills by applying AI to whatever problem you’re trying to solve. It doesn’t seem to matter what area you are in; medical, research, education, technology, software or anything else. Someone, somewhere is offering an AI-based tool that will solve all your problems.

Kubernetes Master Class: GitOps, Rancher and ArgoCD with Codefresh

Join Robert Sirchia and Dan Garfield for this session. Configuration drift is a common problem software developers face. Picture this: two environments are supposed to be similar but are not. Nobody knows exactly what is deployed in that environment/server/cluster, so people are afraid to touch it. It’s declared “off-limits” because nobody can reconstruct it if it breaks down. People do hot-fixes or ad-hoc changes without recording them, and then those developers change teams or companies.

Kubernetes is not the only one. Overview of AWS ECS

In part two, I will cover: Microservices Architecture Overview: New Challenges for Monolithic Architecture As an application grows, so does the amount of code written. This can quickly overwhelm the development environment every time it needs to be opened and run. As you must deploy everything in one place, this approach means that the transition to another programming language, or other technologies becomes a big problem.

Expedite infrastructure investigations with Kubernetes Anomalies

Modern Kubernetes environments are becoming increasingly complex. In 2021, Datadog analyzed real-world usage data from more than 1.5 billion containers and found that the average number of pods per organization had doubled over the course of two years. Organizations running containers also tend to deploy more monitors than companies that don’t leverage containers, pointing to the increased need for monitoring in these environments.

EKS Pricing 101: A Guide To Understanding EKS Costs

AWS did not intend to build Amazon EKS; it simply had to. Kubernetes adoption beamed light years ahead of AWS' own managed container orchestration service. This forced AWS to develop a managed service to accommodate customers who wanted to use upstream Kubernetes but did not want to do the management themselves. As soon as AWS got around to it, it knocked the Kubernetes-based container management service out of the park.

Log Forwarding with HAProxy and Syslog

Developing a strategy for collecting application-level logs necessitates stepping back and looking at the big picture. Engineers developing the applications may only see logging at its ground level: the code that writes the event to the log—for example a function that captures Warning: An interesting event has occurred! But where does that message go from there? What path does it travel to get to its destination?

Need a GPU For Your Next App? Cycle Has Your Back!

After months in development, we are thrilled to announce the launch of Cycle.io's support for NVIDIA GPUs (Beta). Combined with an already powerful platform that enables developers to focus on building, rather than managing, the addition of GPUs will further empower the development of accelerated applications which require a higher level of compute power.

Continuous integration for progressive web apps

Web and browser technology continues to advance, narrowing the gap between the performance of web and native applications. Features that were once exclusive to native applications can be implemented in web applications. This is due in part to the emergence of progressive web applications (PWAs). Web applications can now be installed, receive push notifications, and even work offline.

Episode 6: Mooving to... Real release strategies with Jake Laverty

Every product or application needs a release strategy. It’s how you can double check that everything in your deployment is appropriately tested, validated and verified. Having a standardized release strategy in place allows your team to follow a protocol and reduce the number of unknowns they must face in the product life cycle. However, there are a few considerations to make this critical process run smoothly.

Migration Guide - Jenkins to Razorops

Organizations are becoming more aware of the advantages of upgrading their continuous integration and continuous delivery (CI/CD) pipelines and moving them to the cloud, from lowering infrastructure costs to eliminating security threats. But carrying out a transfer is challenging, especially when a variety of platforms and technologies are involved.

Automate incident response workflows with Eventarc and Datadog

Eventarc is a Google Cloud offering that ingests and routes events between GCP products, such as Cloud Run, Cloud Functions, and Pub/Sub, making it easy to build automated, event-driven workflows in complex environments. By taking care of event ingestion, delivery, authorization, and error handling, Eventarc reduces the development overhead that is required to build and maintain these workflows and helps you improve application resilience.

Simplify microservice governance with the Datadog Service Catalog

Moving from a monolith to microservices lets you simplify code deployments, improve the reliability of your applications, and give teams autonomy to work independently in their preferred languages and tooling. But adopting a microservices architecture can bring increased complexity that leads to gaps in your team members’ knowledge about how your services work, what dependencies they have, and which teams own them.

Tell the story of your incident with timeline curation

It isn’t the first time you’ve heard us say this and it won’t be the last: getting your post-incident process right is a game-changer. Being able to run effective debriefs and create useful postmortems helps us learn from our mistakes, respond better to future incidents and identify how we can build resilience in our product and teams. In short, it’s the thing the shifts the dial from just “fixing” to actually improving.

Linux "find" command - A Complete Guide

Linux users cannot just rely on GUI to perform various tasks in their system. Rather they need to have a good knowledge of the various commands available. One of the very useful commands for any Linux user is the find command. This command is used to locate files in one or more directories. Using this command, Linux users can set a specific search area, filter or find the files on the specific area or directory, and perform actions on the files that match the search.

Delivering effortless hyperautomation to build and drive high-performing 5G networks

5G is seeing rapid adoption due to its promise of high speed, seamless delivery and low latency, that enables it to deliver new and exciting revenue-generating services. Studies reveal that there will be 1.2 billion 5G connections by 2025, covering a third of the world’s population. 5G will play a critical role in enabling the next-generation of applications and services, like gaming, trade and Industry 4.0, in general.

Are your applications secured end-to-end?

Kubernetes has grown immensely, and its use within organizations is maturing. While Kubernetes’ growth is exciting, security concerns around applications deployed on Kubernetes are mounting. Red Hat performed a survey with hundreds of DevOps professionals, and it showed that 55% delayed application releases due to security issues.

Anti-patterns in Incident Response that you should unlearn

It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results, but ignoring anti-patterns can be far worse than choosing rigid processes. In this blog we will explore anti-patterns in incident response and why you should unlearn those.

Escape from VMware Tanzu!

On May 26, 2022, Broadcom announced its intent to acquire VMware for $61 billion. This raises significant questions for VMware Tanzu customers going forward. It is not clear whether VMware will operate as a separate entity or what will happen with VMware Tanzu. There are VMware employee and customer concerns as well. Based on the current results of both Broadcom and VMware, a number of analysts are anticipating either a price increase or reduction in the headcount, or both.

New improvement: global settings in console

At Platform.sh, we are committed to making your development and deployment experience as smooth and seamless as possible. As part of this effort, we regularly listen to your feedback to see how we can improve things. One of those things: is how complex it can be to configure environment or project settings from the Console. That is why we decided to regroup all settings pages together so that they’re easily accessible from a single location.

Publish New Cloudify Blueprints to ServiceNow ITSM Service Catalog via Cloudify "Create Environment"

Newly created blueprints can be made available in a ServiceNow ITSM Service Catalog request. The new blueprints appear as options which can be selected from a pull down menu of Certified Environments inside of a Cloudify request type in ServiceNow ITSM Service Catalog. It is also possible to expose new blueprints as selectable items directly in the ServiceNow ITSM Service Catalog.
Sponsored Post

Performance Benchmarking as Part of your CI/CD Pipeline

Continuous Integration and Continuous Delivery (CI/CD) is perhaps best represented by the infinity symbol. It is something that is constantly ongoing, new integrations are rolled out while not interrupting the flow of information that is already running, as to stop systems in order to update them can be costly and inefficient. In order to ensure that you can successfully implement the latest builds into your system, it is important to know how well they will run alongside the components that are already installed and where there may be bottlenecks.

Monitor your GitHub Actions workflows with Datadog CI Visibility

GitHub Actions provides tooling to automate and manage custom CI/CD workflows straight from your repositories, so you can build, test, and deliver application code at high velocity. Using Actions, any webhook can serve as an event trigger, allowing you, for example, to automatically build and test code for each pull request. Datadog CI Visibility now provides end-to-end visibility into your GitHub Actions pipelines, helping you maintain their health and performance.

To require or not require (fields): that is the question

Required fields have been a hot topic at FireHydrant. Choose too many (or the wrong ones), and you unnecessarily annoy your team during an incident or encourage sloppy data entry that someone has to come back and clean up manually. Don't use them at all and risk insufficient data to efficiently propel an incident toward resolution.

Everything You Need To Know About Cloud Egress Charges

Whenever you move data into or out of a cloud, the traffic crosses one or more networks, potentially resulting in transfer charges. These are known as ingress (moving data into the cloud) and egress (moving data out of the cloud) charges - and there's incentives in most of the pricing models by cloud service providers (CSP) to encourage an organisation to use a direct connection to transfer data, rather than go via the public internet.

Analytics in Squadcast | Incident Management | On-call | SRE | Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.

Integrating Squadcast with Jira (Cloud & Server) - Create tickets & bidirectional sync | Squadcast

You can use this integration guide to install and configure the Squadcast extension in Jira Cloud & Jira Server to create issues in Jira projects when there is an incident in Squadcast. Also learn to automatically or manually sync the status bidirectionally.

Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

You can integrate Squadcast and Slack to collaborate efficiently with your team while working on incidents. Squadcast sends a notification to the configured Slack Channel as soon as an incident is triggered.

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

Tagging & Routing at Squadcast | Incident Management | Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. Auto-add relevant information like priority, severity or alert type to make incoming incidents context-rich. Route alerts to the right responder(s) based on the tags they carry

Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Alert suppression can help you avoid alert fatigue by suppressing notifications for non-actionable alerts. Squadcast will suppress the incidents that match any of the Suppression Rules you create for your Services. These incidents will go into the Suppressed state and you will not get any notifications for them.

4 Ways FP&A Can Partner Successfully With Engineering

The job of a financial planning and analysis (FP&A) professional is to oversee the building of financial models, track budgets, and partner with stakeholders across their organization to make better decisions. For FP&A professionals in charge of partnering with engineering and product organizations, this means understanding how investments in innovation, from software purchases to headcount, are driving business outcomes. Like any team in an organization, this comes with its own unique set of challenges.

Persistent, Distributed Kubernetes Storage with Longhorn

Kubernetes is an open source container orchestration system that enables applications to run on a cluster of hosts. It’s a critical part of cloud native architecture because it can work on public or private clouds and on-premises environments. With an orchestration layer on top of traditional infrastructure, Kubernetes allows the automated deployment, scaling, and management of containerized applications.

What Is Helm in Kubernetes?

Helm is a deployment tool that simplifies installing, configuring, and managing Kubernetes clusters. Anyone familiar with writing Kubernetes manifests knows how tedious it is to create multiple manifest files using YAML. Even the most basic application has at least 3 manifest files. As the cluster grows, the more unwieldy the configuration becomes. Helm is one of the most useful tools in a developer’s tool belt for managing Kubernetes clusters.

Get better visibility into DevOps performance in one place with Atlassian integrations

Every company is a software company and every company wants to get better at it. That’s why Sumo Logic built a set of integrations with Atlassian DevOps solutions. Leveraging data from Atlassian, Sumo Logic now enables you to visualize the key, actionable insights behind the DevOps Research Assessment (DORA) metrics to continuously improve your software delivery performance. Sumo Logic’s observability platform presenting Atlassian data brings the following benefits, to name a few.

The Secret Ingredient for Faster Deployment: Use On-demand Environments

If you are part of a DevOps or Cloud engineering team struggling to set up and maintain proper environments, then this article is for you. We will discuss “On-demand” environments in detail. We will go through its benefits, how development teams can take advantage of it, and how Qovery’s on-demand environments give you a competitive advantage in the business.

Raising the bar on cybersecurity with Ubuntu for AWS GovCloud

US government agencies and organizations in government-regulated industries face a huge challenge in complying with regulations and standards in order to run their workloads. AWS GovCloud and Ubuntu make it much easier to deploy missions securely in the cloud whilst remaining compatible with compliance requirements.