Skip to content

ADR: Airflow dependency isolation and KubernetesPodOperators

Decision Title

Implement KubernetesPodOperator for Airflow DAG Operators to Eliminate Dependency Installation in the Airflow Environment.

Status

  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

We extensively use Airflow DAGs for data processing, which depend on various external libraries. These libraries are installed directly into the Airflow environment, leading to frequent dependency conflicts.

Additionally, updating these libraries often requires redeploying the entire Airflow setup, causing operational interruptions.

Decision

We propose adopting the KubernetesPodOperator for managing our Airflow operators and tasks. This approach means each task within a DAG will run in its own Kubernetes pod, i.e., in a separate container. This helps manage dependencies more efficiently and allows for task creation using languages other than Python.

Rationale

Benefits of using the KubernetesPodOperator include:

  • Dependency Isolation: Each task runs in a separate container, preventing conflicts.
  • Scalability: Kubernetes can efficiently manage and scale these containers as needed.
  • Flexibility: Teams can reuse their own Airflow instances without worrying about installing extra dependencies.
  • Easy Updates: Task dependencies can be updated by changing the container tag in the corresponding Airflow Variables, without affecting other tasks or the entire Airflow system.

Consequences

  • Complexity: Managing tasks as Kubernetes pods involves a more complex setup.
  • Learning Curve: Teams will need to learn how to manage tasks using Kubernetes and work with KubernetesPodOperators.
  • Local Machine Setup: This solution requires setting up a K8S cluster on the machine or connecting to external clusters.

Tradeoff Analysis - Input to Decision

  • Pros:

    • Resolves issues with dependency conflicts.
    • Utilizes Kubernetes for efficient resource management.
    • Facilitates easy updates of individual DAGs.
  • Cons:

    • Increases the complexity of a deployment and maintenance processes.
    • Necessitates training for the teams to adapt to new processes.

See also

KubernetesPodOperator official documentation

OpenZGY DAG

Edited by Yan Sushchynski (EPAM)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information