ADR: EDS Ingest - Push Functionality using Workflow API

Introduction: The purpose of this ADR is to enable ad-hoc push request so that the data from the Provider can be loaded into the Operator.

More Background: https://osdu-community.ideas.aha.io/ideas/IDEA-I-118

Objective: The objective of this ADR is as follows:

  • Provider user is able to manually load the data from the Provider to the Operator using existing configurations (CSRE & CSDJ) as per agreement between Provider and Operator
  • Provider will provide the payload below to trigger EDS Ingest via Workflow API
    1. (Required) connectedSourceDataJobId: str = A valid existing CSDJ ID in Operator with both party agreement
    2. [New] (Optional) overrideDataFilter: str = A parameter used to override data.Filter in CSDJ record

Status

  • Drafting
  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Scope The scope of this ADR includes the following scenarios:

Scenarios Expected Behaviour

1. When EDS Ingest is triggered by Workflow API using invalid payload

(Existing Behavior) EDS Ingest will not run execute any logic and will raise error message

2. When EDS Ingest is triggered by Workflow API using payload connectedSourceDataJobId: str only

(Existing Behavior) EDS Ingest will run using CSDJ configuration without being scheduled and execute underlying EDS logic

3. When EDS Ingest is triggered by Workflow API using payload connectedSourceDataJobId: str and overrideDataFilter: str

(New Behavior) EDS Ingest will run using CSDJ configuration and override/mask data.Filter field with overrideDataFilter content for this particular job only (without record save) before executing underlying EDS logic.

Upon completion of EDS Ingest, the CSDJ record will not be updated. This includes the field data.LastSuccessfulRunDateUTC, data.FailedRecords, data.CreateTimeMax.

Reason: It will impact the next incremental load run with failed records appended (either scheduled or non-scheduled)

Given /assumptions:

  1. Data Loader (Provider/Operator) should agree on who will be loading the ad-hoc data.
  2. Data Loader (Provider/Operator) should have the role of 'service.workflow.creator' as stated by Workflow Service to trigger the any workflow (ie. eds_ingest/osdu_ingest)
  3. Provider & Operator should agree on the existing CSDJ and the records given are configured correctly.
  4. Provider & Operator should agree on the ad-hoc data to be loaded and overrideDataFilter: str is provided by Provider
  5. Provider & Operator will be responsible for the quality of the data and the timeliness delivery of data

Required Changes:

  1. Implementation of input payload models using PyDantic Data class so that the input is validated
    1. (Required) connectedSourceDataJobId: str = A valid existing CSDJ ID in Operator with both party agreement
    2. [New] (Optional) overrideDataFilter: str = A parameter used to override data.Filter in CSDJ record
  2. Override logic on data.Filter field using overrideDataFilter: str

Implementation:

  1. Create Input Model and perform Input validation for OSDU record ID using PyDantic Data Class
  2. Create the logic to override the field data.Filter if overrideDataFilter: str exists

Sequence Diagram: image

Functional Requirements:

  1. The system should be able to handle the different input (defined in the objective)
  2. The system should be able to execute the current processes of EDS Ingest (ie. processing of parent mapping and reference mapping)

Non-functional Requirements:

  1. Performance/Maintainability: The system should update the CSDJ record directly at data.Filter without looping thru all of the other field
  2. Compatibility: The system should be able to run all of the current processes without any change in behaviour
Edited by Teo Sheng Pu