In this Twitch Engineering Blog article, I share how my former team at Twitch achieved 4 9's of availability on AWS for a .NET microservice, on the world's largest video game streaming platform. It's an insightful story, and uses medieval castle defenses as an analogy for availability defenses.
My time at Twitch gave me first-hand insights into operational excellence, in a way my former career did not. My consulting experience had equipped me with a working knowledge of proper architecture for high availability at scale, but not the experience of actually managing operations.
I had previously thought of operations management as a necessary but somewhat boring area of IT. I couldn't have been more wrong. One of my proudest achievements was leading an initiative to raise availability for a critical .NET microservice at Twitch, and we exceeded expectations, because we made it a moonshot project:
- We treated availability like security, using techniques like threat modeling.
- We used a defense-in-depth approach to designing our availability defenses.
- We innovated. Our systems engineer designed brilliant CDN behaviors that shielded customers from the impact of service failures.
- We optimized. Our senior software engineer attained an astounding 50-fold increase in service performance through pioneering work with response compression techniques.
If you're running .NET microservices on AWS, rest assured that 4 9s of high availability at mammoth scale is both achievable and sustainable. Read my article on The Twitch Engineering Blog for a blow-by-blow account of what we learned and the principles we followed.