ADR: Airflow dependency isolation and KubernetesPodOperators
Decision Title
Implement KubernetesPodOperator for Airflow DAG Operators to Eliminate Dependency Installation in the Airflow Environment.
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
We extensively use Airflow DAGs for data processing, which depend on various external libraries. These libraries are installed directly into the Airflow environment, leading to frequent dependency conflicts.
Additionally, updating these libraries often requires redeploying the entire Airflow setup, causing operational interruptions.
Decision
We propose adopting the KubernetesPodOperator
for managing our Airflow operators and tasks. This approach means each task within a DAG will run in its own Kubernetes pod, i.e., in a separate container. This helps manage dependencies more efficiently and allows for task creation using languages other than Python.
Rationale
Benefits of using the KubernetesPodOperator include:
- Dependency Isolation: Each task runs in a separate container, preventing conflicts.
- Scalability: Kubernetes can efficiently manage and scale these containers as needed.
- Flexibility: Teams can reuse their own Airflow instances without worrying about installing extra dependencies.
-
Easy Updates: Task dependencies can be updated by changing the container tag in the corresponding
Airflow Variables
, without affecting other tasks or the entire Airflow system.
Consequences
- Complexity: Managing tasks as Kubernetes pods involves a more complex setup.
- Learning Curve: Teams will need to learn how to manage tasks using Kubernetes and work with KubernetesPodOperators.
- Local Machine Setup: This solution requires setting up a K8S cluster on the machine or connecting to external clusters.
Tradeoff Analysis - Input to Decision
-
Pros:
- Resolves issues with dependency conflicts.
- Utilizes Kubernetes for efficient resource management.
- Facilitates easy updates of individual DAGs.
-
Cons:
- Increases the complexity of a deployment and maintenance processes.
- Necessitates training for the teams to adapt to new processes.