Today’s application software runs in the complex environment of interdependent services connected in a network. Users don’t even think about it and many software developers give networks little more consideration than electricity. Meanwhile…

Processes, servers, NICs, switches, and local and wide area networks can all fail, with real economic consequences. Network outages can suddenly occur in systems that have been stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems—sometimes for days on end. Partitions deserve serious consideration.

Source: The Network is Reliable – ACM Queue by Peter Bailis, UC Berkeley and Kyle Kingsbury, Jepsen Networks.

Click the link above to learn from real world outages at Google, Amazon,  HP, Microsoft, MongoDB, Yahoo! and more.

Image by Hikmet Gümüş.