Dataset Signed URLs do not use Staging container.
Decision Title
Dataset Signed URLs do not use Staging container.
Status
- Proposed
- Trialing
- Under review
- Approved
- Retired
Context & Scope
One of the security practice followed by File Service
was that the Signed URLs for Upload operations were generated against a Staging Blob Container.
Upon upload, the client will use Post FileMetadata API
which will internally copy the blob from Staging container to the Persistent Container and updates the Record's metadata in Storage Service.
Given that the Signed URLs are active for decent amount of time, there is always a possibility that a client could update the Blob at any point of time. However, having separate containers in File Service ensured that, even if client does update the blob object, until he re-invokes the metadata api, the update stays invalid and the system is not effected.
However, the Dataset service architecture (Dataset + DMS services) do not have the same capability. It generates upload Signed URLs on the persistent container thereby exposing a risk of inconsistency between blob object and metadata. For clients to migrate to Dataset from File Service, this gap should be fixed.
Decision
The decision is to add support for Staging containers for enhancing security of the system as well as cutting down the inconsistencies between the data and its metadata.
Rationale
Dataset service would be more complete in terms of managing dataset as an entity. It would be formed by leveraging capabilities of the existing functionalities. This would help us re-use what exists and extend the capabilities.
Consequences
Dataset service will be responsible for supporting staging / landing zones for different DMS providers based on need. There are some modifications on existing upload functionalities that are proposed below for the DMS services.
Functionality | API | Status | Capability |
---|---|---|---|
Copy DMS | /copy | New | Copies the DMS content from Staging location to Persistent location |
Before updating dataset's metadata in Storage service, Dataset service will let DMS service to copy contents from staging location to persistent location.
Note: The Copy API will be optional for a DMS service. While registering new DMS in Dataset service, the DMS can specify if "Copy API" will support. The dataset service in "registerDataset" API will invoke copy API only if the DMS supports it.