36 min listen
From QAos to Chaos Engineering
FromThe Cloudcast
ratings:
Length:
27 minutes
Released:
Oct 19, 2022
Format:
Podcast episode
Description
Benjamin Wilms (@MrBWilms, co-founder/CEO of @Steadybit) talks about the importance of resilience for SREs, DevOps, and developers through chaos engineering platformsSHOW: 661CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW SPONSORS:Datadog Synthetic Monitoring: Frontend and Backend Modern MonitoringEnsure frontend issues don’t impair user experience by detecting user-facing issues with API and browser tests with a free 14 day Datadog trial. Listeners of The Cloudcast will also receive a free Datadog T-shirt. Granulate, an Intel company - Autonomous, continuous, workload optimizationgProfiler from Granulate - Production profiling, made easyCDN77 - Content Delivery Network Optimized for Video85% of users stop watching a video because of stalling and rebuffering. Rely on CDN77 to deliver a seamless online experience to your audience. Ask for a free trial with no duration or traffic limits.SHOW NOTES:Steadybit (homepage)Steadybit wants developers involved in Chaos engineering before production (TechCrunch)Topic 1 - Benjamin, give everyone a quick introduction.Topic 2 - Let’s start with the concept of chaos engineering. In its simplest form, chaos engineering intentionally takes down parts of a test or production environment (typically after software has shipped) randomly so teams, typically SRE’s/ops/dev, are forced to make the applications more resilient over time. It’s not a matter of if systems will go down, it’s a matter of when. This makes the systems better over time. Benjamin, you have a consulting background in this area that ultimately led to founding Steadybit. What were the limitations to this approach?Topic 3 - What you’re talking about is a more proactive approach to downtime. I’ll call this resilience engineering and it requires a shift in mindset in an organization. How do you get developers onboard to embrace the need? Are we asking developers to share responsibility for outages with the SRE organization?Topic 4 - On the surface, the obvious benefit is reduced downtime. That can be hard to quantify in business value. Outages can be measured, a lack of outages is harder to quantify. Does this become an issue in convincing an organization to embrace this methodology?Topic 5 - When you say we are going to move chaos engineering into the CI/CD pipeline, what does that mean? Is this code that is added? Testing simulations that have to be passed? Real time failures of databases or nodes or simulated? What are the common use cases?FEEDBACK?Email: show at the cloudcast dot netTwitter: @thecloudcastnet
Released:
Oct 19, 2022
Format:
Podcast episode
Titles in the series (100)
The Cloudcast (.net) #6 - Is Cloud Computing Fashionable for IT?: Aaron & Brian talk with Industry Analyst Vanessa Alvarez about defining cloud, plumbing and infrastructure, the changing role of IT, industry certifications & some predictions for the next year by The Cloudcast