[ADR] Metadata deletion service in SDMS
Status
-
Initiated -
Proposed -
Under Review -
Approved -
Rejected
Problem statement
SDMS needs a way to delete millions of datasets (including metadata and files). A single delete operation can include up to 50 million datasets and last multiple hours or even days.
The implementation of this bulk-delete operation is CSP-specific. For Azure, we need to delete metadata from CosmosDB and files from the BlobStorage.
Proposed Solution
- Delegate deletion to the new metadata deletion service;
- Develop the metadata deletion service in the same .NET solution as the SDMS Sidecar, and similarly dockerize it.
- Deploy it to the same k8s cluster as SDMS API;
- Let the SDMS API and the metadata deletion service communicate via Redis:
- send deletion tasks as messages in a Redis list;
- track deletion job status in a Redis hash;
- Store the task queue and the deletion statuses in the same Redis instance that is currently used by SDMS API for creating metadata locks.
Related ADR in SDMS repo: osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service#107
Sequence diagrams
Performing deletion:
Keeping track of the deletion job progress:
Rationale
Reusing the same Redis instance that is currently used for the locks
This Redis is already provisioned and is immediately available to use.
We expect to consume almost no extra capacity in this Redis instance, because the bulk delete operations are infrequent. Ballpark: creating ~10 small documents per day.
The new service will, however, regularly update the status of the job in Redis. These are atomic updates once a few seconds. It shouldn't be
If, however, the separation of concerns is desirable, it will be very simple to migrate to another Redis instance in the future.
Redis list as the task queue
Redis list can also be used as a simple message queue, and Redis is already available in SDMS API.
Ideally, we would use a proper message queue (such as a Service Bus queue) to schedule the bulk deletion jobs. This would give us retryability and observability of the jobs out of the box. We will explore this option in the next iterations of the deletion service.