There will be failures – On systems that live through difficulties instead of turning them into a catastrophy

Our systems always depend on other systems and services and thus may and will be subject to failures – network glitches, dropped connections, load spikes, deadlocks, slow or crashed subsystems. We will explore how to create robust systems that can sustain blows from its users, interconnecting networks, and supposedly allied systems yet carry on as well as possible, recovering quickly – instead of aggreviating these difficulties and turning them into an extended outage and potentially substiantial financial loss. In systems not designed for robustness, even a minor and transient failure tends to cause a chain reaction of failures, spreading destruction far and wide. Here you will learn how to avoid that with a few crucial yet simple stability patterns and the main antipatterns to be aware of. Based primarily on the book Release It! and Hystrix. (Presented at Iterate winter conference 2015.)

Continue reading