Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • H Home
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 37
    • Issues 37
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • System
  • Home
  • Issues
  • #57
Closed
Open
Issue created Oct 08, 2020 by Chris Zhang@ChrisZhangOwner

Dataset service available as core service for all CSPs

Updated on Jan 15, 2021.

With Dataset as core service ADR approved, this issue is updated to track the work needed for all CSPs to have dataset service available in all cloud platforms, which includes the SPI implementation, pipeline and integration tests setup in Gitlab.

Current code is in data flow project (https://community.opengroup.org/osdu/platform/data-flow/data-workflow-framework/dataset-registry), and it should be moved to system project.

Original issue entry:

Title: File service to support use cases of multiple files

File service design ADR has limsted scope of supporting single file ingestion. An endpoint was added to get uploadURL for this use case. There are other use cases like VDS support that requires support of multiple files or location of files. This item is to track the related work needed.

I quote Joe's comments in the previous ADR on this: "If we can change the endpoint to getLocation, where this endpoint is partition and entitlement aware, and where this endpoint returns a JSON object that informs the data producer on all aspects required to execute a cloud SDK for their respective object storage, then this will support all cloud providers. I cannot support an approach that does not support all cloud providers. This endpoint would need to return things like 1) base URL (based on partition) 2) namespace within the base URL (aka folders) 3) all other information required to get an access token from the SDK versus generally providing a signedURL. This would also eliminate the need for a FileID. If you are stuck on providing a signedURL, then, that should be an optional attribute within the object that I have described. That way, a producing application can use the signedURL if there is one, and be able to do something else if the signedURL is not provided. As it is, the design does not leave an options."

Edited Jan 15, 2021 by Chris Zhang
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking