Packaged DAGs for Deployment and Maintenance of DAGs

Decision Title

Packaged DAGs for Deployment and Maintenance of DAGs

Status

  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

Currently a DAG repository consists of

  • DAG files
  • supporting python files
  • operators
  • hooks
  • sensors

For a DAG to be functional these have to be deployed into the airflow cluster. For manifest ingestion we have around 30-40 python files that need to be copied into different location in airflow like dags, operators, sensors and hooks.

There are couple of concerns with this

  • We have to deploy these DAGs at the root dags folder of airflow as the whole dags were written assuming that it will be deployed in airflow root folder for DAGs, operators, sensors.
  • There is no single deployment unit for DAGs.

Decision

  • DAGs must be distributed as single deployable unit. This can be achieved through packaged the dags into a single zip file.
  • DAG contributors must follow the following folder structure
├── osdu_dag
│   ├── __init__.py
│   ├── custom_lib
│   │   ├── __init__.py
│   │   └── utils.py
│   └── operators
│       ├── __init__.py
│       └── customOperator1.py
└── test_dag.py
  • In the above structure
    • test_dag.py is the actual DAG file
    • "osdu_dag" is creating a namespace for this dags. All the needed dependencies like python files, operators, sensors and hook can be inside that folder
  • A zip file is created and shared as deployable unit.

Consequences

The DAG repositeries need to rearrange the code to support above structure. There will be some changes for import statements

Rationale

This will give flexibility for consumers to deploys DAGs at any location in the airflow.

Example

I have created this example zip file with some test dag, test custom operator and some util files. zip_dag.zip

References

Edited by Chris Zhang
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information