Ingestion Workflow issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues2024-01-17T15:31:22Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/136ADR: Workflow Versioning and Update Workflow API2024-01-17T15:31:22ZVineeth Guna [Microsoft]ADR: Workflow Versioning and Update Workflow API# Workflow Versioning
Workflow versioning is a feature to enable seamless running of a newer version of an existing workflow via ingestion workflow. Below are the design challenges/questions around workflow versioning we will discuss goi...# Workflow Versioning
Workflow versioning is a feature to enable seamless running of a newer version of an existing workflow via ingestion workflow. Below are the design challenges/questions around workflow versioning we will discuss going forward
## How to create new version of airflow OSDU DAG’s?
As airflow does not have a way to distinguish between two different versions of a same DAG, we will build this functionality around airflow capabilities
Airflow distinguishes different DAGs based on the DAG name; hence we can create multiple versions of a single DAG by adding a version as a suffix to the DAG name. For example
|Workflow Name|Workflow Version|DAG Name|
|-------------|----------------|--------|
|CSV Parser| 1.0.0 |csv-parser-1.0.0|
|CSV Parser| 2.0.0 |csv-parser-2.0.0|
|CSV Parser| 1.3.1 |csv-parser-1.3.1|
Workflow Version can be one or more of the following
- Git SHA
- Release Version
We can leverage the pipelines which build the final DAG/Packaged DAG to suffix this version to the airflow DAG name before generating the final artifact which can be consumed by airflow to get the new version of an existing DAG up and running
## How ingestion workflow understands about different versions of an existing workflow/DAG?
A workflow metadata in ingestion consists of the following properties
- Workflow ID
- Workflow Name
- Version
- Registration Instructions
- DAG Name (In Airflow)
We can use the combination of version and DAG name to identify different versions of a workflow, For example
|Workflow Name| Workflow Version| DAG Name| Explanation|
|-------------|-----------------|---------------|------------------|
|csv-parser| 1 |csv-parser-1.0.0| This corresponds to a workflow with name “csv-parser” with version “1” which when triggered will use “csv-parser-1.0.0” as the DAG to create a DAG run|
|csv-parser| 2 |csv-parser-1.2.0| This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-1.2.0” as the DAG to create a DAG run, note that this is a minor version change |
|csv-parser| 3 |csv-parser-2.0.0| This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-2.0.0” as the DAG to create a DAG run, note that this is a major version change|
## Can we trigger different versions of a workflow?
There will always be only one active version of workflow which can be triggered, to answer the question we cannot trigger different versions of same workflow, we can only trigger the active version of the workflow.
Existing trigger workflow API does not support triggering different versions of a workflow
To explain the above, follow the below example
|Workflow Name| Workflow Version| DAG Name| ACTIVE?|
|-------------|-----------------|---------------|--------------|
|Foo| 1| Foo-1.0.0| Yes|
|Foo| 2| Foo-2.0.0| No|
|Bar| 2| Bar-2.0.0| Yes|
|Bar| 1| Bar-1.0.0| No|
In this case we can trigger workflow for the following
- When “foo” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “1”, so it triggers “foo-1.0.0” DAG on airflow
- When “bar” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “2”, so it triggers “bar-2.0.0” DAG on airflow
**There can always be only one active version for a workflow**
## How to add a new version of a workflow?
We can use the update workflow API to add a new version of a workflow, the details of the API will be discussed below
## How to mark a version of workflow as active?
We can use the update workflow API to mark a version of workflow as active, the details of the API will be discussed below
## How to get all versions of workflow?
Another API is introduced to get all the versions of a workflow. This API should return all workflow metadata for all versions present in the system
Refer to get versions API in this specification here - [API Specification](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## How to mark a version of workflow as active?
By default, once you add a new version of workflow using update workflow API, it becomes active, to make an older version of workflow active, use mark workflow version active API
Steps to activate older version of workflow
1. Call get all versions API to fetch the existing versions of workflow
2. Determine the version of workflow which needs to be activated
3. Call mark workflow version active API by passing the version and workflow name
We can use the mark workflow version active API to make an older version active, refer to the API in this [specification]( https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## Will updates to a workflow effect the existing in progress workflow runs?
All the existing workflow runs will not get effected because of this change, they will run and reach to a completion state
Any new workflow runs triggered after this change will trigger the DAG associated with the active version as discussed above
## Any changes to the existing API specifications?
|API| Any Changes?| API Specification Changes| Behavioral Changes|
|---|---------------|----------------------------|------------------------|
|Register workflow| No| N/A| N/A|
|Get all workflows| Yes| N/A| It should return active version of workflow|
|Delete workflow| Yes| N/A| It should delete all versions of a workflow|
|Get workflow by name| Yes| N/A| It should return active version of workflow|
|Trigger workflow| Yes| N/A| It should only trigger the active version of a workflow|
|Get all workflow runs| Yes| N/A| It should return all workflow runs across all versions of a workflow|
|Get specific workflow run| Yes| N/A| It should get the status of the workflow run based on the DAG associated to the version|
|Update workflow run| No| N/A| N/A|
|Info| No| N/A| N/A|
## How does this change effect existing workflows in the system?
All existing workflows only have one version; hence we treat existing workflows having only single version and use it to trigger the respective DAG’s
If any new metadata is missing as part of the workflow, the workflows should be updated with this metadata if it has some benefits, else we can keep it as is
## Any limitations on the number of versions supported per workflow?
For now, there are no limitations set, but we can revisit this part if we see any issues
## Can we disable DAGs in airflow for inactive versions?
We cannot disable DAGs in airflow as it will result in stopping all the in progress DAG runs, which is not acceptable
If we can build a solution around checking whether all the DAG runs are completed for an inactive DAG asynchronously, we can disable the DAG, this is again only needed if it helps improve airflow performance
## How does workflow versioning apply for system workflows?
It is similar to normal workflows, the concept of versioning for normal workflows applies to system workflows in similar way, as system workflows applies to all data partitions, any change in version of system workflow will affect all data partitions
# Workflow Update API
Check the update API in this specification - [API Specification](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## Update API supports the following
- To add a new version of workflow and activate it
```bash
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>' \
--data-raw '{
"registrationInstructions": {
"dagName": "csv-parser-2.0.0",
"dagContent": ""
}
}'
```
- To activate an older version (version 1) of workflow
```bash
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser/version/1/active' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>'
```
## Update API Limitations
- Cannot update dagName for an already existing version of workflow
- Cannot update description of workflow
- Cannot disable any version of workflow
# Sequence Diagrams for API's after introducing this feature
## Get workflow by name
![Get_Workflow_By_Name](/uploads/8b39b7ceaab1169c03da21309687c800/Get_Workflow_By_Name.png)
## Get all workflows
![Get_All_Workflows](/uploads/626124ec56f069e39ad6c342dbaf4426/Get_All_Workflows.png)
## Trigger workflow
![Trigger_workflow](/uploads/455ac50e420c910f1df519d1a0ef292f/Trigger_workflow.png)
## Get workflow run
![Get_workflow_run](/uploads/20b719d081b6ab83f821a0e2719923ea/Get_workflow_run.png)
## Delete workflow
![Delete_Workflow](/uploads/c7bcc09428fa07bc1af456cda63b0636/Delete_Workflow.png)Vineeth Guna [Microsoft]Vineeth Guna [Microsoft]