Redundancy in the Cloud


Traditionally, large corporations have either maintained their own data centres, or rented rack space in a data centre managed by another company. Redundancy, in this context, has to be maintained on several levels:


  • Two or more independent supplies, each capable of maintaining the load in the event of supply failure
  • Uninterruptible power supplies capable of carrying the load until power is restored or backup generators are brought online
  • Cooling

  • Two or more Heating, Ventilation and Air Conditioning (HVAC) systems, with enough spare capacity to cover the data centre’s requirements should one system fail
  • System hardware

  • Multiple internal power supply units
  • Multiple independent network connections
  • Multiple independent storage connections
  • Application

  • Multiple systems, each capable of maintaining access to the relevant application subsystem in the event of a single failure elsewhere.
  • Geographical

  • Spreading the redundancy across multiple sites, such that if the main site goes down, another site can be brought online quickly.
  • The advent of cloud service providers changes the equation significantly. The provisioning of power, cooling, and system hardware redundancy to meet contracted service level agreements is now the sole responsibility of the cloud service provider. As a result, companies leveraging the cloud need to architect redundancy through networking and geographical measures.

    When designing an application for deployment in the cloud, the key determining factors in deciding just how much redundancy to build in are two that often stand in opposition.
    1. The criticality of the application to the business
    2. The cost of increased redundancy.

    For an application that is business critical, the cost of a high degree of redundancy is compensated by the cost to the business if the application becomes unavailable for an extended period. For a low priority application, a reduced level of redundancy is reasonable.

    The exact details of how the redundancy might be provided can vary from application to application. Having spare instances that can be brought up when necessary might suffice for some. Maintaining redundant instances, running in parallel across multiple sites (AWS and Azure availability zones, for example) might be reasonable. And in some cases, redundancy across a large geographical distance (Azure and AWS regions) will be the way to go. Having redundancy provided by the cloud provider means that your business can focus on the important matter – ensuring availability of your applications in accordance with your business needs, rather than the infrastructure and plumbing necessary to make it happen.

    Alternatively, if the application in question is not an in-house developed application, it may be possible to purchase the application via a service subscription. In this scenario, the business pays the service provider to run and maintain the application. Responsibility for redundancy rests entirely with the service provider; as long as they hold up their end of the bargain, the business only needs to pay the bills and maintain a reliable Internet connection. This does, however, bring its own set of downsides: upgrades and maintenance are done on the service provider’s schedule, rather than the business’s, with the consequence that outages can occur at times that are inconvenient. For common applications, the savings in not having to maintain the application internally may well be worth these inconveniences, especially if the service provider commits to providing adequate advance warning of maintenance outages, and/or scheduling those outages for weekends or other periods of low demand. As always, the analysis of which approach will work best needs to be done on a case-by-case basis; there is no “one size fits all” solution to these problems.