[Datasets Change] Rework dataset management mechanism
Context
One of a DDMS's responsibilities in OSDU is the derived domain data management. The data DDMS preserves must be managed and accessed only via the DDMS API. That is, there not must be other ways to change the data besides the DDMS.
Problem Statement
- Current bulk data management mechanism uses the OSDU Dataset Service for the bulk data ingestion and retrieval.
Note: the approach was implemented during POC to delegate data residency management to the OSDU dataset service.
- During the search module development, the team identified an additional latency that can be eliminated by removing the additional requests hop.
Scope
- Replace the logic for file management with a dedicated bucket (container) usage.
- Change the logic for URI dataset management (DDMSDatasets[] property population)
- Update unit and integration tests
- Develop the data migration script from the old to the new approach.
Acceptance Criteria
- All files (parquet) are uploaded into a dedicated bucket (container)
- Access to the bulk is managed by the DDMS, based on parent WPC record.