From QAos to Chaos Engineering

FromThe Cloudcast

Start listening View podcast show

From QAos to Chaos Engineering

FromThe Cloudcast

ratings:

Length:

27 minutes

Released:

Oct 19, 2022

Format:

Podcast episode

Description

Benjamin Wilms (@MrBWilms, co-founder/CEO of @Steadybit) talks about the importance of resilience for SREs, DevOps, and developers through chaos engineering platformsSHOW: 661CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:Datadog Synthetic Monitoring: Frontend and Backend Modern MonitoringEnsure frontend issues don’t impair user experience by detecting user-facing issues with API and browser tests with a free 14 day Datadog trial. Listeners of The Cloudcast will also receive a free Datadog T-shirt. Granulate, an Intel company - Autonomous, continuous, workload optimizationgProfiler from Granulate - Production profiling, made easyCDN77 - Content Delivery Network Optimized for Video85% of users stop watching a video because of stalling and rebuffering. Rely on CDN77 to deliver a seamless online experience to your audience. Ask for a free trial with no duration or traffic limits.SHOW NOTES:Steadybit (homepage)Steadybit wants developers involved in Chaos engineering before production (TechCrunch)Topic 1 - Benjamin, give everyone a quick introduction.Topic 2 - Let’s start with the concept of chaos engineering. In its simplest form, chaos engineering intentionally takes down parts of a test or production environment (typically after software has shipped) randomly so teams, typically SRE’s/ops/dev, are forced to make the applications more resilient over time. It’s not a matter of if systems will go down, it’s a matter of when. This makes the systems better over time. Benjamin, you have a consulting background in this area that ultimately led to founding Steadybit. What were the limitations to this approach?Topic 3 - What you’re talking about is a more proactive approach to downtime. I’ll call this resilience engineering and it requires a shift in mindset in an organization. How do you get developers onboard to embrace the need? Are we asking developers to share responsibility for outages with the SRE organization?Topic 4 - On the surface, the obvious benefit is reduced downtime. That can be hard to quantify in business value. Outages can be measured, a lack of outages is harder to quantify. Does this become an issue in convincing an organization to embrace this methodology?Topic 5 - When you say we are going to move chaos engineering into the CI/CD pipeline, what does that mean? Is this code that is added? Testing simulations that have to be passed? Real time failures of databases or nodes or simulated? What are the common use cases?FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet

Released:

Oct 19, 2022

Format:

Podcast episode

Discover this podcast and so much more

From QAos to Chaos Engineering

From QAos to Chaos Engineering

Description

Titles in the series (100)

More Episodes from The Cloudcast

Related podcast episodes