Skip to content

ADR: Contribution of MDIO Components to OSDU Seismic DMS

Contribution of MDIO Components to OSDU Seismic DMS

Table of Contents

  1. Status
  2. Context & Scope
  3. Decision
  4. Rationale
  5. Consequences
  6. When to revisit
  7. Tradeoff Analysis - Input to decision
  8. Decision criteria and tradeoffs
  9. Decision timeline
  10. Artifacts and Implementation
  11. Review Process Management

Status

  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

This proposal aims to integrate MDIO as a standardized method for seismic data storage and access within the OSDU Seismic DMS. MDIO, an open-source format for multi-dimensional data indexing, provides a cloud-compatible solution for storing large seismic datasets across Azure, AWS, and GCP. By contributing MDIO components, we intend to streamline seismic data management within OSDU while extending compatibility for data stored in OSDU Seismic DMS.

image

The primary scope includes:

  1. Implementing the SDFS driver to enable OSDU as an MDIO storage target.

image

  1. Enhancing SDUTIL with support for storing and accessing any file collections, including MDIO.

image

  1. Introducing SDFS-UTIL as a more performance-effective alternative to SDUTIL for multi-threaded data transfers. image

  2. Developing a utility for converting SEGY files to MDIO collections within the OSDU Seismic DMS, triggered/executed using an Airflow DAG.

image

  1. Registering MDIO datasets in the OSDU catalog through a custom schema, dataset--FileCollection.MDIO, for structured metadata integration.

This integration aligns MDIO’s capabilities with OSDU standards, enabling seamless seismic data management across cloud platforms and supporting interoperability for seismic data workflows.

Decision

We propose to contribute the MDIO ecosystem to OSDU, supporting its use for seismic data in the OSDU Seismic DMS. By incorporating MDIO into OSDU, this decision will:

  • Enable MDIO format as a viable storage and access method within OSDU.
  • Support OSDU’s commitment to interoperability by broadening storage compatibility for seismic data.
  • Enhance community collaboration by providing the industry with a robust alternative for seismic data management, enabling efficient, accessible data handling.

Rationale

MDIO’s integration provides substantial technical and operational advantages for seismic data management:

  • Interoperability:
    Enabling MDIO storage compatibility within OSDU allows seamless cross-platform support for seismic data across major cloud providers.
  • Performance:
    MDIO’s indexing capabilities offer efficient storage and retrieval, especially for large seismic datasets, with segment-level data access.
  • Community Value:
    By supporting an industry-aligned format like MDIO, OSDU can attract geoscience professionals, catering to varied and complex data handling needs.

Consequences

Positive outcomes of this contribution include:

  • Broader support for seismic data formats within OSDU, specifically MDIO.
  • Improved data access performance and interoperability across cloud providers.

Potential challenges include:

  • Additional maintenance and support requirements within OSDU for the contributed MDIO components.
  • Ensuring backward compatibility and integration with existing OSDU data workflows.

When to revisit

This ADR should be revisited if:

  • OSDU requirements evolve significantly, impacting MDIO’s functionality.
  • New formats or standards emerge, prompting an assessment of MDIO’s continued compatibility and relevance within OSDU.
  • Significant community feedback suggests adjustments to improve integration.

Tradeoff Analysis - Input to decision

Alternatives and implications

Alternative Approaches Considered

  • Develop a Custom Format for OSDU Seismic DMS:
    This would involve creating a format tailored specifically to OSDU. However, this approach would require considerable development resources and may not align with existing industry practices or address complex data segment requirements effectively.
  • Rely Solely on Existing Formats without MDIO:
    OSDU already supports SEGY, a legacy format, as a seismic data standard. However, SEGY is increasingly viewed as obsolete, lacking segment-level data access and posing usability challenges, making it less suitable for modern seismic workflows.

Implications

Integrating MDIO minimizes development overhead, leverages a format familiar to the community, and aligns with OSDU’s standards for data accessibility across multiple cloud providers.

Decision criteria and tradeoffs

Decision Criteria:

  1. Alignment with Industry Standards:
    MDIO’s adoption within geoscience makes it a suitable addition to OSDU.
  2. Performance and Efficiency:
    MDIO’s efficient indexing and data segment access address the demands of handling large seismic datasets.
  3. Community Demand:
    Widespread industry use of MDIO and its alignment with seismic data workflows make it an appealing option for OSDU adoption.

Tradeoffs:

  • Initial Effort for Integration:
    While incorporating MDIO requires some upfront effort, it reduces long-term maintenance compared to custom solutions.
  • Resource Utilization:
    Leveraging MDIO may necessitate some orientation for OSDU users unfamiliar with the format but ultimately ensures alignment with industry standards.

Decision timeline

The anticipated timeline for community review and approval aligns with the next OSDU release cycle. Initial feedback will guide adjustments to the ADR and the submitted artifacts, with the goal of achieving final approval within 3 weeks.

Artifacts and Implementation

Our team has contributed the following MDIO-related artifacts for integration:

  1. SDFS:
    An fsspec-based Python driver enabling OSDU Seismic DMS as a storage destination for TGS’s MDIO data.
  2. OSDU SDUTIL Upgrade:
    Extends SDUTIL with support for managing any file collections, including MDIO, while maintaining compatibility with single seismic files.
  3. SDFS-UTIL:
    A CLI utility based on SDFS, offering a more performance-effective alternative to SDUTIL for multi-threaded storage and retrieval of seismic data collections from OSDU Seismic DMS.
  4. SEGY-TO-MDIO-CONVERSION:
    A utility for converting SEGY files to MDIO collections within the OSDU Seismic DMS, triggered/executed using an Airflow DAG.
  5. SEGY-TO-MDIO-CONVERSION-DAG:
    Airflow DAG
  6. Register schema: dataset--FileCollection.MDIO:
    A custom schema for registering MDIO datasets within the OSDU catalog, ensuring structured metadata integration and discoverability.

OSDU repositories/artifacts to be created/upgraded

image

Review Process Management

Our team will:

  • Place these artifacts in feature branches of the appropriate OSDU GitLab repositories.
  • Reference these branches in the ADR to allow reviewers to assess, discuss, and provide feedback on each component.
  • Coordinate with the OSDU community throughout the review process, promptly addressing feedback and adjusting artifacts as needed to facilitate approvals.
Edited by Rostislav Dublin (EPAM)