Architecture Change - Data Partition - Add dedicated Storage Account for use by Ingestion Service
Service name : Ingestion Service
Why is this needed
Currently we only have a single container where the data is being stored for all the dag runs. The data sharing across tasks is being done by generating SAS tokens at container level. This gives the access to any dag run to access the data from any other dag runs as well. This leads to a security concern regarding data storage and hence and brings a requirement to change the existing infrastructure.
Generating sas tokens on a container level. This container is dedicated towards storing the data required for all the dag runs.
The new change will add a storage account, which is dedicatedly used for ingestion-workflow where the containers will be created and deleted on the fly.
Created - Whenever we have the requirement to share data across tasks in workflow for a particular dag run.
Deleted - Once the DAG run is completed either with success or failure the container created for that DAG run is deleted.
As containers are created and deleted on the fly, a dedicated storage account is needed for this usecase so that these temporary storage containers don't pollute the existing storage account.
Other Solutions Considered
Explored ways to handled this isolation at directory level where we would use a single container for storing data for all the dag runs. There is no support for the SAS generation at directory level. This forced us to go with SAS generation on container level.
- Adding a new storage account to the existing infra without breaking changes
- Ensure the unit tests for infra-azure-provisioning pass
- Update the ingestion service code to reflect on the infra chanes (
getSignedUrlfor ingestion service to use new storage account where the sas tokens will be generated for the newly created containers)
- Update all required documentation
- Update architecture diagram
Storage account config requirements -
- Replication type - LRS
- Backup requirements - No backup
- Data retention requirements - No data retention
The lock must be removed on the storage account prior to executing this change due to the removal action of a container from the storage account.
- Creation of new Storage Account
- Deletion of storage container - "workflow-tasks-sharing"
- Obtain approval for any infrastructure requirements.
- Implement any required infrastructure changes.
- Obtain approval for merge request(s) containing infrastructure changes.
Integration Test Onboarding