#20: Chaos engineering

27 paź 2020 | Podcast | programowanie | IT

We tend to focus on testing happy paths and expected edge cases. But how do you make sure that your system can survive minor infrastructure and network failures, as well as application bugs? Especially in microservice or serverless environment, where there are tons of moving parts. I've seen too many times systems that fail miserably because some minor dependency was malfunctioning. For example, you have a tiny service that displays a small social widget on your website. When that service is down, the rest of the website should work. But without proper care and testing, you may end up with global HTTP 503 failure. Code reviews and unit tests are fine, but the ultimate test is... turning off that service on production. And making sure the rest actually works. This is called chaos engineering.

Believe it or not, many organizations do practice deliberately injecting faults into production. Now, turning off a service's instance on production is probably the easiest test you can conduct. The client must catch an exception and handle the failure gracefully. Sometimes by retrying, hoping to reach another healthy instance. Sometimes by returning a fallback value that's less relevant or up-to-date. Ideally, the end-user should not realize one of the services is down. Of course, that would mean that a failed service is not needed at all and can be shut down forever. So in practice, we expect visible, but insignificant degrade in service quality.

Read more: https://256.nurkiewicz.com/20

Get the new episode straight to your mailbox: https://256.nurkiewicz.com/newsletter

POSTY TEGO AUTORA

#38: HTTP cookies: from saving shopping cart to online tracking30 mar 2021

Podcast | programowanie | cookies | http | web development | IT

#37: Fallacies of distributed computing23 mar 2021

Podcast | programowanie | distributed computing | IT

#36: Microservices architecture: principles and how to break them16 mar 2021

Podcast | programowanie | software architecture | microservices | IT

#35: Reactive programming: from spreadsheets to modern web frameworks2 mar 2021

Podcast | programowanie | reactive programming | IT

#34: SQL joins23 lut 2021

Podcast | programowanie | sql | join | IT

Around IT In 256 Seconds

Around IT In 256 Seconds