Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • D Dataset
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 12
    • Issues 12
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 5
    • Merge requests 5
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • System
  • Dataset
  • Issues
  • #21
Closed
Open
Issue created May 07, 2021 by Krishna Nikhil Vedurumudi@krveduruMaintainer1 of 5 checklist items completed1/5 checklist items

Dataset Signed URLs do not use Staging container.

Decision Title

Dataset Signed URLs do not use Staging container.

Status

  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

One of the security practice followed by File Service was that the Signed URLs for Upload operations were generated against a Staging Blob Container.

Upon upload, the client will use Post FileMetadata API which will internally copy the blob from Staging container to the Persistent Container and updates the Record's metadata in Storage Service.

Given that the Signed URLs are active for decent amount of time, there is always a possibility that a client could update the Blob at any point of time. However, having separate containers in File Service ensured that, even if client does update the blob object, until he re-invokes the metadata api, the update stays invalid and the system is not effected.

However, the Dataset service architecture (Dataset + DMS services) do not have the same capability. It generates upload Signed URLs on the persistent container thereby exposing a risk of inconsistency between blob object and metadata. For clients to migrate to Dataset from File Service, this gap should be fixed.

Decision

The decision is to add support for Staging containers for enhancing security of the system as well as cutting down the inconsistencies between the data and its metadata.

Rationale

Dataset service would be more complete in terms of managing dataset as an entity. It would be formed by leveraging capabilities of the existing functionalities. This would help us re-use what exists and extend the capabilities.

Consequences

Dataset service will be responsible for supporting staging / landing zones for different DMS providers based on need. There are some modifications on existing upload functionalities that are proposed below for the DMS services.

Functionality API Status Capability
Copy DMS /copy New Copies the DMS content from Staging location to Persistent location

Before updating dataset's metadata in Storage service, Dataset service will let DMS service to copy contents from staging location to persistent location.

Note: The Copy API will be optional for a DMS service. While registering new DMS in Dataset service, the DMS can specify if "Copy API" will support. The dataset service in "registerDataset" API will invoke copy API only if the DMS supports it.

Dataset_Staging_Containers

Edited Jul 15, 2021 by preeti singh[Microsoft]
Assignee
Assign to
Time tracking