ADR: EDS Triggering & Monitoring Custom Post Processing API
Introduction
The functionality purpose is to enable custom post-processing workflow outside of EDS scope as a core service.
Examples of custom post-processing workflow are as follows but not limited to:
- DDMS API calls = Seismic DDMS, Wellbore DDMS, etc.
- Custom data loading workflow with data sources other than OSDU (ie. local storage, private storage on prem)
Status
-
Drafting -
Proposed -
Trialing -
Under review -
Approved -
Retired
Business Use Case
Current EDS ingestion is good for meta data ingestion and file ingestion into OSDU dataset storage. For custom data ingestion that are domain specific such as Seismic Data Store, Wellbore DDMS Bulk data are not within the scope of EDS. There is also a need for Data Providers and Operators to handle different data delivery methods and setup.
Example 1 - Seismic Data ingestion (Post-Ingest):
- Seismic metadata is ingested via EDS ingest.
- EDS Ingest will trigger external process to perform SEGY file upload to OSDU Seismic Data store using SDMS or SD Util.
Example 2 - Well Log data ingestion (Post-Naturalization):
- Well Log meta data is ingested via EDS ingest.
- EDS Naturalization is automatically triggered after EDS Ingest.
- Well Log (LAS file) is naturalized by EDS into OSDU Storage.
- EDS Naturalization trigger external process to ingest Well Log bulk data via Wellbore DDMS.
Notes: The external process might include data operator specific data processing which includes, fetching the data from private source or further processing of the data might be applied prior or after the EDS data ingestion.
Problem Statement
-
OSDU Domain Services
In R3(Mercury Release), there are 4 domain services are part of the scope. As of 4 Dec 2024, there are in total of 7 domain services are in active development. Each domain services will have their own specific use case and domain knowledge/data/records/use cases are required.
It is important to understand that not all array data are the same and therefore there is a need to optimize the storage and access based on the kind of data. [OSDU Software / OSDU Data Platform / Domain Data Management Services / Home]
In EDS, we would like to be a core service that are not tied to any specific domain. As EDS continue to grow & adopt by Operators & Providers, the demand of EDS integration with other domain services are growing rapidly.
-
Introduction of unique use cases specific for Operators/Providers
As more Operators & Providers adopt OSDU & EDS, we would expect more domain unique use cases to be introduced. In EDS, we would avoid tailoring specific use cases into the EDS logic/codes, thus we would also propose a solution that is more generic to handle different use cases.
-
The limitation of Airflow DAGs currently are as follows:
- Limited Storage for processing
- Limited Computational power
- Must confine to Python Programming
- Must confine to Python version that Airflow platform is utilizing
- Must confine to pre-installed python package that Airflow platform is utilizing (no additional package can be installed)
Value Proposition
-
Provide an extension and flexibility for EDS users to extends the functionality beyond EDS. This will fulfill and integrate any domain or business specific requirements via the extensions.
-
Keeping EDS data ingestion generic as a core data ingestion service. Not to include domain specific functionality such as DDMS services. DDMS data ingestion will be provided by the users custom API, integrated through EDS Post Processing Trigger and Monitoring.
-
Enable the utilization of optimized storage technologies as offered by OSDU Domain Data Management Services
A Domain Data Mgmt Service (or DDMS service) is one that persists data of a specific domain and provide access through optimized domain APIs. It is governed by overall platform but is developed and evolved independently.
Proposal
- Allow user to configure API endpoints to be triggered by EDS major events.
- Post Ingestion
- Post Naturalization
- Once the EDS job has completed the above process, it will fire the call to the API configured in CSRE (Security and Authentication Scheme) and CSDJ (Endpoint information).
- Custom API specs will be defined by EDS Team. The API will need to conform to EDS guideline in order to integrate with EDS.
Objective
-
Post-processing [ie. Post-Ingestion & Post-Naturalization]
- Able to trigger custom post-processing API after EDS job is completed
- Able to monitor the status of the custom post-processing
- Able to add the message and results of custom post-processing into Activity record
Assumption/Guideline
- The Post-Processing API (Post-Ingest & Post-Naturalization) will need to follow the Technical Details: Guideline for Post-Processing API Adoption (#130)
Required Changes
-
[Data Definition Team] - Adding properties for External Processes (EDS) (#95) || Adding properties for External Processes (EDS) - from Community Gitlab (#11)
- Changes to ConnectedSourceDataJob
Cumulative Name Value Type Required? Title Description Pattern data.ExternalProcesses[] object optional ExternalProcesses A list of external processes configuration to be executed by EDS (No pattern) data.ExternalProcesses[].SecuritySchemeName string required Security Scheme Name Reference name for the security scheme in the ConnectedSourceRegistryEntry document this external process belongs to. (No pattern) data.ExternalProcesses[].Url string required Endpoint Url External Process endpoint (No pattern) data.ExternalProcesses[].EdsExternalProcessType string required EDS External Process Type ID reference of the External Process Type ^[\w-.]+:reference-data--EdsExternalProcessType:[\w-.:%]+:[0-9]*$-
Reference Values For
(reference-data--EdsExternalProcessType)idwith prefixnamespace:reference-data--EdsExternalProcessType:Code Name NameAlias Description AttributionAuthority PostIngestion PostIngestion PostIngestion - The External Process will be triggered after EDS Ingest is completed osdu PostNaturalization PostNaturalization PostNaturalization - The External Process will be triggered after EDS Naturalization is completed osdu
-
[EDS Team] Perform the required code changes as defined in Planned Changes in EDS Post Processing Implementation (#132)
Functional Requirement
- Able to trigger the custom post processing at post-eds ingest and post-eds naturalization
- Able to monitor the custom post processing
- Able to take the completion results of the post processing into Activity Template
Non-functional requirement
- Maintainability: Codes are written using Pydantic Datamodel and utilizes coding best practices
- Compatibility: The system should be able to run all of the current processes without any change in behaviour
- Local testing: Unit Test are created and test cases are included as part of the local development testing
- Adaptability: EDS provides documentation on how to adopt this Post Processing API
Progress Checker
Add your task for tracking
