H

Home

This is the home project for the OSDU data platform services

Domain Data Management Services

Introduction

Generic APIs are the foundation that our OSDU data platform is built on. They are published as open endpoints to access metadata and data as generic types allow the continuous addition of new data types support an ecosystem of:

  • System developers extending the platform
  • Systems integrators integrating the platform
  • Customers adopting the platform

Domain APIs provide an (E&P) domain specific way of accessing the system. They carry semantics not possible in a generic API, making them a powerful tool for application developers who want the very best performance for accessing data. We may need to update this type-safe optimized API every few years, but they allow storage optimization based on access patterns over generic APIs.

A Domain Data Mgmt Service (or DDMS service) is one that persists data of a specific domain and provide access through optimized domain APIs. It is governed by overall platform but is developed and evolved independently.

Context and Background

It is important to understand where DDMS fits in within the OSDU data architecture. As a quick recap, let us review the OSDU Data flow OSDU Data flow

When data is enriched from several sources, a consumable version is available and the DDMS becomes the repository for this and provides type-safe access to this model. It also uses the extracted array data and reorganizes this optimally to ensure applications can run on OSDU and gain high-performance access to the array data.

In that regard, it is important to understand that not all array data are the same and therefore there is a need to optimize the storage and access based on the kind of data. The following picture illustrates large volume seismic data, high cardinality (relatively) small arrays for geology data (trajectories, logs) or temporally sensitive and streaming Drilling operations/production data or Geospatial data etc. Different Array Data

OSDU therefore encourages such optimized domain-centric micro-services to address this need - such services are called Domain Data Management Services or DDMS. The OSDU platform therefore becomes a framework where such extensions can be registered, discovered and invoked programmatically. This allows for both open-source and proprietary extensions to be added to the platform:

  • Ability to develop domain extensions in parallel
    • Split domain data flows, domain storage/access concerns
  • Support domain developers thru type-safe interfaces instead of generic core services
  • Address domain specific concerns
    • Core Platform governs minimum principles across domains and shared services

Platform and DDMS Extensions

Capabilities

A DDMS is a set of:

  • Highly optimized storage & access for bulk data, with highly opinionated API’s delivering the data required to enable domain workflows
  • Strongly governed schemas that incorporates domain-specific perspective and type-safe accessors for registered entity types

A Domain Data Management Service (DDMS) can be seen as any source of truth for data that manages the data life cycle, satisfies given mandatory data access concerns, and makes its data globally discoverable and retrievable through the OSDU Data Platform. Domain Data Management Services provide type-safe entity access and optimized accessors for bulk data to access 1D, nD data in point, plane, volume and other consumption centric access pattern.

A DDMS can implement a "shadow record" pattern where for each piece of data stored into the DDMS a corresponding record is created in OSDU platform core storage to help enable and enforce these characteristics.

These characteristics are

  • Legal compliance
  • Data access authorization
  • Discovery of data
  • Retrieval of the data

An example of the kind of data managed by a seismic domain DDMS is as follows: Seismic Survey in a DDMS

Additional Details

Here is a big picture view of where DDMS fit in within the overall architecture of the platform: DDMS Conceptual Architecture

These are the key principles that all DDMS should follow:

  • All raw data is preserved
    • Ingestion and retention minimize data loss.
  • Data is globally identifiable
    • Context specific data identity prevents leveraging data.
  • Data is immutable
    • Metadata without exception. By exception, transient data is subject to cost considerations. Versioning preferred.
  • Data is access controlled
    • Only authorized identities have access to data.
  • Data is governed for right of use
    • Data compliance is enforced in all entry/exit points.
  • Data is discoverable
    • All data domain management services provide index.
  • Data is consumable
    • Data domain management service must register itself and provide a standardized way of consuming its data.
  • Improved data is new data
    • Enrichment is a workflow and results in new data.
  • Data lineage is tracked
    • All transformations and workflows must provide lineage.
  • Data is owned and managed by producers
    • Data persistence and management is decentralized.
  • Data is referenced
    • Copies of data are possible in different consumption zones (e.g., data ponds) but domain data management stores are the ultimate sources of truth.

Please refer to Conceptual Architecture for OSDU for details on DDMS.

R3 Scope

The following domain data management services are being planned for R3 timeframe: