Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • I Ingestion Workflow
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 45
    • Issues 45
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data Ingestion
  • Ingestion Workflow
  • Issues
  • #136

Closed
Open
Created Dec 31, 2021 by Vineeth Guna [Microsoft]@vineethgunaMaintainer

ADR: Workflow Versioning and Update Workflow API

Workflow Versioning

Workflow versioning is a feature to enable seamless running of a newer version of an existing workflow via ingestion workflow. Below are the design challenges/questions around workflow versioning we will discuss going forward

How to create new version of airflow OSDU DAG’s?

As airflow does not have a way to distinguish between two different versions of a same DAG, we will build this functionality around airflow capabilities

Airflow distinguishes different DAGs based on the DAG name; hence we can create multiple versions of a single DAG by adding a version as a suffix to the DAG name. For example

Workflow Name Workflow Version DAG Name
CSV Parser 1.0.0 csv-parser-1.0.0
CSV Parser 2.0.0 csv-parser-2.0.0
CSV Parser 1.3.1 csv-parser-1.3.1

Workflow Version can be one or more of the following

  • Git SHA
  • Release Version

We can leverage the pipelines which build the final DAG/Packaged DAG to suffix this version to the airflow DAG name before generating the final artifact which can be consumed by airflow to get the new version of an existing DAG up and running

How ingestion workflow understands about different versions of an existing workflow/DAG?

A workflow metadata in ingestion consists of the following properties

  • Workflow ID
  • Workflow Name
  • Version
  • Registration Instructions
  • DAG Name (In Airflow)

We can use the combination of version and DAG name to identify different versions of a workflow, For example

Workflow Name Workflow Version DAG Name Explanation
csv-parser 1 csv-parser-1.0.0 This corresponds to a workflow with name “csv-parser” with version “1” which when triggered will use “csv-parser-1.0.0” as the DAG to create a DAG run
csv-parser 2 csv-parser-1.2.0 This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-1.2.0” as the DAG to create a DAG run, note that this is a minor version change
csv-parser 3 csv-parser-2.0.0 This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-2.0.0” as the DAG to create a DAG run, note that this is a major version change

Can we trigger different versions of a workflow?

There will always be only one active version of workflow which can be triggered, to answer the question we cannot trigger different versions of same workflow, we can only trigger the active version of the workflow.

Existing trigger workflow API does not support triggering different versions of a workflow

To explain the above, follow the below example

Workflow Name Workflow Version DAG Name ACTIVE?
Foo 1 Foo-1.0.0 Yes
Foo 2 Foo-2.0.0 No
Bar 2 Bar-2.0.0 Yes
Bar 1 Bar-1.0.0 No

In this case we can trigger workflow for the following

  • When “foo” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “1”, so it triggers “foo-1.0.0” DAG on airflow
  • When “bar” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “2”, so it triggers “bar-2.0.0” DAG on airflow

There can always be only one active version for a workflow

How to add a new version of a workflow?

We can use the update workflow API to add a new version of a workflow, the details of the API will be discussed below

How to mark a version of workflow as active?

We can use the update workflow API to mark a version of workflow as active, the details of the API will be discussed below

How to get all versions of workflow?

Another API is introduced to get all the versions of a workflow. This API should return all workflow metadata for all versions present in the system

Refer to get versions API in this specification here - API Specification

How to mark a version of workflow as active?

By default, once you add a new version of workflow using update workflow API, it becomes active, to make an older version of workflow active, use mark workflow version active API

Steps to activate older version of workflow

  1. Call get all versions API to fetch the existing versions of workflow
  2. Determine the version of workflow which needs to be activated
  3. Call mark workflow version active API by passing the version and workflow name

We can use the mark workflow version active API to make an older version active, refer to the API in this specification

Will updates to a workflow effect the existing in progress workflow runs?

All the existing workflow runs will not get effected because of this change, they will run and reach to a completion state Any new workflow runs triggered after this change will trigger the DAG associated with the active version as discussed above

Any changes to the existing API specifications?

API Any Changes? API Specification Changes Behavioral Changes
Register workflow No N/A N/A
Get all workflows Yes N/A It should return active version of workflow
Delete workflow Yes N/A It should delete all versions of a workflow
Get workflow by name Yes N/A It should return active version of workflow
Trigger workflow Yes N/A It should only trigger the active version of a workflow
Get all workflow runs Yes N/A It should return all workflow runs across all versions of a workflow
Get specific workflow run Yes N/A It should get the status of the workflow run based on the DAG associated to the version
Update workflow run No N/A N/A
Info No N/A N/A

How does this change effect existing workflows in the system?

All existing workflows only have one version; hence we treat existing workflows having only single version and use it to trigger the respective DAG’s If any new metadata is missing as part of the workflow, the workflows should be updated with this metadata if it has some benefits, else we can keep it as is

Any limitations on the number of versions supported per workflow?

For now, there are no limitations set, but we can revisit this part if we see any issues

Can we disable DAGs in airflow for inactive versions?

We cannot disable DAGs in airflow as it will result in stopping all the in progress DAG runs, which is not acceptable If we can build a solution around checking whether all the DAG runs are completed for an inactive DAG asynchronously, we can disable the DAG, this is again only needed if it helps improve airflow performance

How does workflow versioning apply for system workflows?

It is similar to normal workflows, the concept of versioning for normal workflows applies to system workflows in similar way, as system workflows applies to all data partitions, any change in version of system workflow will affect all data partitions

Workflow Update API

Check the update API in this specification - API Specification

Update API supports the following

  • To add a new version of workflow and activate it
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>' \
--data-raw '{
    "registrationInstructions": {
        "dagName": "csv-parser-2.0.0",
        "dagContent": ""
    }
}'
  • To activate an older version (version 1) of workflow
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser/version/1/active' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>' 

Update API Limitations

  • Cannot update dagName for an already existing version of workflow
  • Cannot update description of workflow
  • Cannot disable any version of workflow

Sequence Diagrams for API's after introducing this feature

Get workflow by name

Get_Workflow_By_Name

Get all workflows

Get_All_Workflows

Trigger workflow

Trigger_workflow

Get workflow run

Get_workflow_run

Delete workflow

Delete_Workflow

Edited Feb 16, 2022 by Vineeth Guna [Microsoft]
Assignee
Assign to
Time tracking