[ADR] Domain API
Introduction
In order to natively support seismic datasets as defined by the OSDU authority, avoid duplicating the logic in applications to convert seismic data from one schema version to another, and potentially implement different logic, Seismic DMS should provide APIs to support and manage seismic datasets by validating their schema model and return them to the latest version of the schema.
Status
- Proposed
- Trialing
- Under review
- Approved
- Retired
Context & Scope
(1) OSDU SCHEMAS ORGANIZATION
In the OSDU schemas organization schemas are organized into different categories. A “dataset” schema provides a piece of bulk data information along with its logical representation while the seismic record of other categories requires to be linked with an existing (pre-ingested) dataset.
SDMS will provide a set of domain-specific API to support these schema format:
-
FileCollection Datasets:
-
Work Product Components:
-
Master Data:
References and Naming Convention:
- SCHEMA: The seismic schema model, for example, SeismicTraceData / SesmicBinGrid / FileCollection.SEGY /...
- SCHEMA VERSION: The schema model versions, for example, SeismicTraceData 1.0.0 / SeismicTraceData 1.6.2 /...
- RECORD: The seismic object schema recorded in the DE Storage Service
- RECORD-ID: the unique record ID, for example, ABC1234
- RECORD-VERSION: The record versions, for example, ABC1234 V1 / ABC1234 V2 /...
(2) SDMS DOMAIN SPECIFIC APIs
SDMS will provide domain-specific APIs to handle the ingestion, schema validation, and underline bulk management for seismic datasets and components SCHEMA as defined by the OSDU authority.
For each supported SCHEMA, we will document the model with examples and provide APIs to manage both RECORD and their VERSIONS
- An endpoint to ingest the seismic dataset:
- When an object is ingested using this endpoint, a new RECORD will be created if the RECORD-ID is not specified with the request model. A new RECORD-ID and the RECORD-VERSION will be generated and returned. In addition, for FileCollection dataset schema only, a storage resource will be created to host bulk.
- When an object is ingested using this endpoint, a RECORD will be updated if the RECORD-ID is specified in the request with the request model. A new RECORD-VERSIOn will be generated and returned.
- An endpoint to list all datasets of a specific kind
- This endpoint will support query paginated.
- An endpoint to retrieve the last version of the RECORD-ID
- An endpoint to retrieve a specific version of the RECORD-ID
- An endpoint to retrieve all versions for a RECORD-ID
- An endpoint to delete the RECORD with all associated version
- This endpoint will perform and hard delete by removing all RECORD-VERSIONS athe nd associated bulk.
An endpoint to reindex dataset ingested with the V3 version of SDMS into the V4
The SDMS service will provide support to the highest Patch.Minor version of each Major and automatic conversion between versions. For example, if the client calls the v1 endpoint that supports the schema version 1.1.0 to request a record that was ingested with a schema version 1.0.0, SDMS will automatically convert the required record, from the ingested version 1.0.0 to the supported 1.1.0. In addition, we will support conversion between Major versions if conversion rules have been correctly specified (or an error will be thrown).
For each SCHEMA VERSION, the schema model will be documented (in the shared swagger) and examples will also be provided:
(3) STORAGE ORGANIZATION AND CONNECTION STRINGS
Each time a FileCollection SCHEMA is ingested in SDMS, a new storage container is created in the CSP storage service. The container name will be automatically generated by SDMS by hashing the dataset name information specified in the request schema with the generated ID to guarantee the unicity of the storage resource in each partition. SDMS will provide specific endpoints to generate connection strings for a RECORD-ID and/or RECORD-VERSION to let the caller independently ingest the associated bulk. These storage resources are protected and the connection strings are released only after the caller has been authorized by the service via shared Entitlement Service (ACL-check).
These are the endpoints SDMS will expose for generating upload or download connection strings:
(4) DATASET UPLOAD EXAMPLE WORKFLOW
(5) DATASET DOWNLOAD EXAMPLE WORKFLOW
(6) Implementation
Check Merge Requests associated to this issue and OpenAPI definition for details.