[ADR] De-couple stateful and stateless resources from Service Resources group
Problem Identified
The service resources contain multiple stateful resources such as Storage Account, Redis cache and Postgres DB. Recreating these resources is not always acceptable especially when they contain production data.
Impact
Upgrade of any resource in Services Resources group requires us to bring down the partial / entire resource group which results in Downtime.
Proposed Solution
Here is an approach using which we can scale resources in Service Resources as many times as required without having to incur downtime.
The idea is to decouple stateless and stateful resources.
This is a migration strategy, hence would contain multiple steps.
Phase – 1
A new resource group (say deployment resources) will be created that will contain all the stateless resources that were part of the Service Resources group. This includes VNet, Subnets, App Gateway, AKS, AKS Config (namespaces, secrets etc) and Helm config.
The client creates the new resource group and will make sure that all the services are up and running.
Upon satisfaction, the client is expected to perform a DNS switch to divert the traffic to the newly created cluster setup. Once the traffic on the older setup is down to zero, phase-2 can be invoked.
Phase – 2
In this phase all the stateless resources that were part of the Service resources group will be deleted. So, the Service Resources will continue to host the common stateless resources that are required by the OSDU services.
Rollout Strategy
-
Create ADR for the changes
Implementation plan
-
Terraform changes – Carve out the terraform resources to a new resource group. -
Changes required in ADO Pipelines -
Changes required in Manual deployment -
Documentation – Migration strategy -
Deadline to delete Stateless resources permanently from Service resource group.