Enhanced API Specs for Ingestion Workflow Service
- Under review
Context & Scope
In OSDU R2, there is Ingestion Workflow Service(https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow) for orchestration tool( airflow) specific managerial operations. In OSDU R3, the proposal is to have an enhanced version of Ingestion Workflow Service. Both serve a similar purpose i.e to provide a wrapper functionality for orchestrator tools and is designed to carry out CRUD operations of domain workflows and domain workflow runs. R2 version of the service talks about only 3 APIs – startWorkflow, updateWorfklowStatus, getStatus. R2 version of the service is tightly dependent on Apache Airflow as the orchestration tool. This ADR introduces an enhanced version of Ingestion Workflow Service API to cater to more complex workflow scenarios in Ingestion workflows. Also, the aim is to have an orchestrator tool independent specification.
In OSDU R3 framework, Ingestion Workflow Service will be responsible for end to end management (creation, modification, execution and monitoring) of ingestion workflows from user perspective. This will become the way to create domain workflows in OSDU Data Platform. Users with workflow creation & triggers roles are completely abstracted from technical complexities in the orchestration tool used.
Supported workflow operations by the new version are as follows:
CRUD operations for Workflow:
- Creation\registration of workflow. (new)
- Updation\editing of workflow (new)
- Querying details of a workflow. (new)
- Listing all configured workflows. (new)
- Deletion of workflow. (new)
CRUD operations for Workflow Run:
- Triggering a workflow. (already exist)
- Querying details of all workflow runs for a workflow.
- Querying details of a workflow run. (already exist)
The Domain Workflow expectations from an orchestration tool are not supported out of the box by options available in the market. Different domain workflows have different expectations on case by case basis. It is important to have a wrapper layer which casts the behaviour of the orchestrator tool to a custom behaviour suited for domain workflows executions on Data Platform. This will also enable loose coupling of orchestrator tools (currently airflow) with Data Platform. Adoption of alternate orchestration tools will be better managed.
Data Platform's workflow expectations and orchestrator tool behaviour mismatch will be better managed. In case of modifications to Orchestrator tools, changes can be incorporated without multiple touch points spread across the Data Platform. Breaking changes while version upgrades (if any) and alternative tool implementation will be a controlled activity.
Tradeoff Analysis - Input to decision
The new proposed version will incorporate below user actions in true sense:
- User is completely abstracted from the underlying orchestrator tool. Workflow editors can create and manage complex workflow without being technical experts on the orchestrator tool.
- OSDU data platform will be easily able to integrate future changes to Orchestrator framework. Technical changes will be within periphery of the Workflow Ingestion Service.
- Users can query into historical runs of the workflows, and the status of ingestion can be tracked at right granularity.This will end user domain workflow success\failure\in progress reporting.
- Users will have fine grained control over the attributes of workflow (for example – max concurrency, active\inactive features).
This version of service will have no negative impact when compared to current version of Ingestion Workflow Service in terms of:
Decision criteria and tradeoffs
- Cost of Implementation