Packaged DAGs for Deployment and Maintenance of DAGs
Decision Title
Packaged DAGs for Deployment and Maintenance of DAGs
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
Currently a DAG repository consists of
- DAG files
- supporting python files
- operators
- hooks
- sensors
For a DAG to be functional these have to be deployed into the airflow cluster. For manifest ingestion we have around 30-40 python files that need to be copied into different location in airflow like dags, operators, sensors and hooks.
There are couple of concerns with this
- We have to deploy these DAGs at the root dags folder of airflow as the whole dags were written assuming that it will be deployed in airflow root folder for DAGs, operators, sensors.
- There is no single deployment unit for DAGs.
Decision
- DAGs must be distributed as single deployable unit. This can be achieved through packaged the dags into a single zip file.
- DAG contributors must follow the following folder structure
├── osdu_dag
│ ├── __init__.py
│ ├── custom_lib
│ │ ├── __init__.py
│ │ └── utils.py
│ └── operators
│ ├── __init__.py
│ └── customOperator1.py
└── test_dag.py
- In the above structure
- test_dag.py is the actual DAG file
- "osdu_dag" is creating a namespace for this dags. All the needed dependencies like python files, operators, sensors and hook can be inside that folder
- A zip file is created and shared as deployable unit.
Consequences
The DAG repositeries need to rearrange the code to support above structure. There will be some changes for import statements
Rationale
This will give flexibility for consumers to deploys DAGs at any location in the airflow.
Example
I have created this example zip file with some test dag, test custom operator and some util files. zip_dag.zip
References
- Airflow packaged DAGs https://airflow.apache.org/docs/apache-airflow/1.10.12/concepts.html#packaged-dags
Edited by Chris Zhang