[ADR] Hierarchical deletion of datasets

Introduction

We need a way to delete millions of datasets (including metadata and files in blob storage) in Seismic DMS. A single delete operation can include up to 50 million datasets.

The purpose of this ADR is to define the approach to implementing a hierarchical delete feature in SDMS.

Status

Problem statement

SDMS API currently exposes the following endpoints for deleting datasets:

DELETE /dataset/tenant/{tenantid}/subproject/{subprojectid}/dataset/{datasetid}

Deletes a single dataset.
DELETE /subproject/tenant/{tenantid}/subproject/{subprojectid}

Deletes a subproject.

The endpoint deleting a subproject currently does not scale to the required number of datasets. The current implementation also leaves a possibility of an inconsistent state between the metadata and files in blob storage - in case some of the files fail to be deleted, the deletion of metadata associated with these datasets is not reverted.

SDMS currently does not have the functionality of deleting only selected datasets in a subproject, filtered by path, tags, labels, etc.

Proposed Solution

In short:

Create new API endpoints to support starting and tracking progress of the asynchronous deletion operation.
Deploy a new service on k8s that would asynchronously delete datasets.

Overview

We will introduce the bulk-delete feature as follows:

Implement and deploy a separate application to the same K8s cluster: the deletion service. This service will accept the bulk deletion requests from SDMS API, perform the deletion and keep track of the progress of this long-running operation.
Add the new endpoint to SDMS API to delete all datasets in a specified path:

PUT /operations/bulk-delete?sdpath={sdpath}

Status: 202 Accepted

sdpath in the format sd://tenant/subproject/path

Response schema:
```
{
    "operationId": "{string}"
}
```

Add the new endpoint to SDMS API to view the status and progress of the delete operation:

GET /operations/bulk-delete/status/{operationid}

Status: 200 OK

Response schema:

{
    "OperationId": "{string}",
    "CreatedAt": "{string}",
    "CreatedBy": "{string}",
    "LastUpdatedAt": "{string}",
    "Status": "{string}",
    "DatasetsCnt": "{int}",
    "DeletedCnt": "{int}",
    "FailedCnt": "{int}"
}

Headers will contain data-partition-id information to check if the user is registered in the partition before retrieving the operation status.

Details

Initiating a delete operation

The new PUT endpoint will support the following cases for the dataset path, provided in the sdpath parameter:
- path = /<path>/ - all datasets under the specified path should be deleted.
- path not specified - all datasets in the subproject should be deleted. If the deletion of the subproject (metadata and container) is desired as well, the clients should call the delete subproject endpoint after the datasets bulk delete operation completes to ensure non-blocking deletion of the subproject in case it is composed by many datasets.
The endpoint triggers the deletion job and returns the ID of the initiated operation.
The delete operation is initiated in SDMS by pushing a message onto a queue (Azure Storage queue in case of Azure implementation; a different queuing mechanism can be used by other CSPs); the message contains the operationId and the parameters from the original request (tenant, subproject, path).

Deletion service

Deletion service is a separate component from SDMS API, deployed to the same K8s cluster. The implementation details of the service can be decided by the individual CSPs. This section describes the proposed implementation for Azure.

The source code of the new component will be contributed to the Sidecar solution in the seismic-store-service repository.

The logic of the deletion service will work as follows:

The service consumes the message from the Azure Storage queue and initiates the deletion process.
All items (dataset IDs and gcsurl which determines the location in blob storage) matching the provided subproject and path are retrieved from Cosmos database.
For each dataset, the deletion service checks if it is locked.
- If yes, the item is discarded from the delete operation.
- If not, the deletion service locks the dataset. The lock value in this case will contain a string indicating that the dataset is locked for deletion (e.g. WDELETE). This will allow another delete operation to delete the dataset if the deletion failed previously. However, it will prevent deletion of datasets locked with a regular write lock which would indicate that it is being actively used.
The retrieved items are added to storage which allows the deletion service to keep track of the datasets to delete. In the first version of the implementation, the deletion service will store the retrieved datasets in memory. In a later phase we are planning to use a persistent storage (e.g. Service Bus queue) to store the items to be deleted. This will allow the service to resume deletion after a restart as well as retry deletion for the datasets where it failed.
The deletion service leverages existing Redis queues to keep track of the overall deletion operation status and progress.
The deletion service retrieves and deletes the datasets by checking the store containing items to be deleted. In the first version of the implementation it simply iterates over items stored in memory.
- The datasets are processed in batches; for each batch we retrieve the associated blobs from the storage account using the gcsurl property of the metadata.
- The blobs from the current batch are deleted.
- We then delete the metadata documents from Cosmos DB, leaving the ones for which the blob deletion was unsuccessful. We consider that the deletion was successful if the blobs were not found as we assume they were deleted earlier.
- The deletion status is updated in Redis after processing every dataset.
At the end, the status of a completed operation (with errors or without) is saved in Redis.
- The deletion status should not be deleted at this point so that users can query the operation status after completion.

Sequence diagram for the deletion operation

Deletion status

The status of delete operations will be saved in Redis.

It will be written by the deletion service (updated with the current progress) and read by SDMS API (when users request the deletion status).

SDMS API and the deletion service will agree on the naming convention for the key in Redis, e.g. deletequeue:status:{operationId}.

The new GET endpoint allowing users to query the status of a delete operation will return the following information:

OperationId - ID of the delete operation.
Status - Current status of the delete operation; possible values are: 'Not started', 'Started', 'In progress', 'Completed', 'Completed with errors'.
CreatedAt - Timestamp of the creation of the delete operation.
CreatedBy - Entity initiating the delete operation.
LastUpdatedAt - Timestamp of the last status update of the delete operation.
DatasetsCnt - Total number of datasets to be deleted; initially not set, until the enumeration of datasets for deletion is completed.
DeletedCnt - Number of deleted datasets; updated after each dataset processed by the deletion service, after both blobs and metadata are deleted.
FailedCnt - Number of datasets for which the deletion failed; updated after each dataset processed by the deletion service if a failure occurs.

(only the fileds in bold are required)

(dataset counts will be empty if the status is "not started")

Sequence diagram for the deletion status

Out of scope / limitations

Detailed statistics about datasets which failed to be deleted. In the first phase of implementation the deletion status endpoint will provide aggregated statistics as mentioned in the Deletion status section. Users will need to refer to logs to find out which datasets failed to be deleted.
The bulk-delete feature does not guarantee the operation can continue after a restart of the deletion service. It will be up to the different CSPs to determine if there is retry logic for failed datasets or recovery support built into the service.
Deleting 'orphan' blobs with missing metadata. Files without metadata containing a matching gcsurl will not be deleted as part of the delete operation as metadata is the source of truth for which blobs need to be deleted.
Identifying blobs belonging to a different dataset but located in the same virtual folder as files of another dataset. Since gcsurl carries information about the location of files to be deleted, the delete operation will not be able to detect 'unrelated' files erroneously uploaded with the same virtual folder.

Consequences

The same bulk deletion API endpoints can be implemented by any CSPs besides Azure.

The status endpoint is not CSP-specific. As long as the bulk delete implementation saves the job status with the same schema to Redis, the status endpoint will work for any other CSP out of the box.

Edited Nov 23, 2023 by Maggie Salak