Cloud Migration and Multi-Region Failover

2018-03-01
Component Detail
Project Title Hippo Azure Migration
Company Hippo CMMS
Role Director of Engineering
Timeline ~6 months (Q4 2017 – Q1 2018)
Tech Stack C#, .Net, SQL Server, Azure
Architecture Tenant Isolation, RLS, Multi-Region DR

S - Situation: The Challenge

The Hippo SaaS application was operating on a critical vulnerability. The entire platform ran as a single process on a single physical machine within its Rackspace and Peer1 data centers. While this setup maintained data sovereignty, it created a massive single point of failure. There was no disaster recovery (DR) or redundancy, meaning a simple hardware issue like a single RAM failure could bring the entire product offline for all customers within that region.


T - Task: The Objectives

The primary objective was to architect and execute a complete migration from the physical data centers to the Microsoft Azure cloud platform. The key goals were to:

  • Implement high availability and multi-region failover to eliminate the single point of failure.
  • Maintain strict data sovereignty rules for all tenants.
  • Leverage modern cloud services for scalability, including Managed VMs, Azure Elastic Pool Databases, and Blob Storage.
  • Critically, complete the entire migration with zero unscheduled downtime for the 800+ client companies relying on the service.

A - Action: The Strategy & Execution

The migration was executed using C# and .NET Core, with the database backend moving to Azure SQL Server.

Architecture (High Availability & Routing):

https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methodsgeographic-traffic-routing-method

  • Azure Front Door and Azure Traffic Manager were implemented at the edge. This routed users to their geographically closest region and, more importantly, handled automatic failover. If a primary region (e.g., USA) failed, traffic was automatically rerouted to a healthy region (e.g., Canada).
  • A Redis cache was used to store and quickly look up database locations, ensuring users connected to the correct data source even after a failover event.

Architecture (Database & Data Sovereignty):

https://learn.microsoft.com/en-us/azure/azure-sql/database/saas-tenancy-app-design-patterns?view=azuresqlsource=recommendationsg-multitenant-app-with-sharded-multitenant-databases

  • A database sharding pattern by tenant was implemented using Azure Elastic Pool Databases. This was a multi-faceted solution that:
    1. Improved Cost-Efficiency: Allowed for merging multiple tenants into shared elastic database pools instead of paying for hundreds of individual instances.
    2. Maintained Isolation: Retained the ability to move specific tenants to fully isolated databases if required for compliance or performance.
    3. Guaranteed Sovereignty: Ensured tenant data was physically hosted in the required region (Canada, USA, EU, etc.) to meet data laws.

Application & Security:

  • The application processes were moved from the single physical machines to Azure Managed VMs (such as Azure Container Instances) for scalability and redundancy.
  • Azure Blob Storage was used for secure, scalable file storage.
  • To prevent any possibility of data leakage between tenants in the new sharded model, row-level security (RLS), and a database connection predicate was enforced on all queries.
  • End-to-end encryption was maintained for all communication.

R - Result: Outcomes and Impact

The entire platform migration was successfully completed in under 6 months.

The primary business and technical objectives were fully met. The migration was achieved with zero unscheduled downtime for all 800+ client companies, meaning the transition was seamless to the user base.

The new Azure architecture successfully eliminated the critical single point of failure by introducing full multi-region failover and redundancy. Furthermore, the database sharding model immediately improved cost-efficiency and scalability while successfully enforcing all data sovereignty requirements.