Operations | Monitoring | ITSM | DevOps | Cloud

Dashbird

Why Serverless Apps Fail and How to Design Resilient Architectures

We’ve been monitoring 100,000’s of serverless backend components for 2+ years at Dashbird. In our experience, Serverless infrastructure failures boil down to: These isolated faults become causes of failure due to dependencies in our cloud architectures (ref. Difference of Fault vs. Failure). If a serverless Lambda function relies on a database that is under stress, the entire API may start returning 5XX errors.

Serverless monitoring startup Dashbird raises $2.1m and releases new features for serverless monitoring

Dashbird, a platform for serverless application monitoring, has raised $2.1 million in a seed round. The investment was led by Paladin Capital Group, with participation from Passion Capital, Icebreaker.vc and Lemonade Stand.

Early-detection of Potential Sources of Failure in Serverless

We recently wrote about why serverless applications fail and how to design resilient architectures. Being able to detect early-stage failure indicators can be invaluable. With proper monitoring, developers move from waiting for the system to crash and adopt a more proactive attitude in managing resource allocation and architecture design to avoid bottlenecks and performance degradation.

Four immediate benefits you will gain from a modern monitoring platform

Cloud applications don’t just run flawlessly by way of magic. Many things can go wrong, and rest assured some will go wrong at one point. For small teams, this can be cumbersome and take a toll at the development speed. A monitoring system will detect these issues on behalf of the development team, so that they can act accordingly. At Dashbird, we think there’s much more to it, though, than just detecting and alerting issues, especially for small teams of developers.

How Professional Serverless Teams Manage Software Issues

No matter how careful developers are or how comprehensive tests are applied before deployment, there will always be some level of issues to deal with in production. When it comes to managing issues and ensuring application quality, two main metrics should be on our radar: time to discover and time to resolve issues.

What is the ideal retention period for application logs

That is a common question I see among developers. Most of the time, nobody cares about system logs. But when things go south, we absolutely need them. Like water in the desert, sometimes! At Dashbird, we have a list of criteria compiled to determine a reasonable retention policy for application logs. There is no one-size-fits-all, though. The analytical dimensions below will give a relative notion of how long the retention period should be.

When Dedicated DevOps is Not Available

With the rise of cloud computing and modern distributed systems, we also witnessed the rise of a new practice area: DevOps. Despite being fundamental for smooth cloud operations, a dedicated DevOps practitioner is a luxury most teams can’t afford. Salaries average $130K in San Francisco, for example. When a dedicated DevOps practitioner is not available in our team, what should we do? The answer could unfold a multitude of aspects.