Commit 35490ce1 authored by Kishore Battula's avatar Kishore Battula
Browse files

Merge branch 'airflow_autoscaling_documentation' into 'master'

Airflow autoscaling documentation

See merge request !432
parents d7dd7baf 74f8c3ba
Pipeline #55891 passed with stages
in 1 minute and 24 seconds
# Airflow Autoscaling Guide
Airflow autoscaling feature enables autoscaling on airflow components which are deployed in AKS cluster. Below are the airflow components which are auto scalable
- Airflow Web Server
- Airflow Worker
## Prerequisites
- Enable AKS cluster autoscaling by follwing instructions [here](docs/autoscaling.md)
- Upgrade KEDA from 1.5.0 to 2.2.0
## FAQ
### How to enable airflow web server autoscaling?
To enable autoscaling for service resources airflow cluster follow the below steps
- Change autoscale configuration to **true** in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
```yaml
airflow:
web:
autoscale:
enabled: true
```
- Change autoscale label to **true** in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
```yaml
airflow:
web:
labels:
# DO NOT DELETE THIS LABEL. SET IT TO "false" WHEN AUTOSCALING IS DISABLED, SET IT TO "true" WHEN AUTOSCALING IS ENABLED
autoscalingEnabled: "true"
```
To enable autoscaling for data partition specific airflow cluster follow the below steps
- Change autoscale configuration to **true** in [helm-config-dp.yaml](charts/airflow/helm-config-dp.yaml) as mentioned below
```yaml
airflow:
web:
autoscale:
enabled: true
```
### How to enable airflow worker autoscaling?
To enable autoscaling for service resources airflow cluster follow the below steps
- Change KEDA version 2 feature flag to **true** in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
```yaml
keda:
version_2_enabled: true
```
- Change autoscale configuration to **true** in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
```yaml
airflow:
workers:
autoscale:
enabled: true
```
- Change autoscale label to **true** in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
```yaml
airflow:
workers:
labels:
# DO NOT DELETE THIS LABEL. SET IT TO "false" WHEN AUTOSCALING IS DISABLED, SET IT TO "true" WHEN AUTOSCALING IS ENABLED
autoscalingEnabled: "true"
```
To enable autoscaling for data partition specific airflow cluster follow the below steps
- Change KEDA version 2 feature flag to **true** in [helm-config-dp.yaml](charts/airflow/helm-config-dp.yaml) as mentioned below
```yaml
keda:
version_2_enabled: true
```
- Change autoscale configuration to **true** in [helm-config-dp.yaml](charts/airflow/helm-config-dp.yaml) as mentioned below
```yaml
airflow:
workers:
autoscale:
enabled: true
```
### What are the tuning parameters available for autoscaling and their significance?
| Parameter | Usage |
| --- | --- |
| minReplicas | Minimum number of pods to be present |
| maxReplicas | Maximum number of pods beyond which scaleup does not happen |
| scaleDown.coolDownPeriod | Time interval between two scaledown events <br><br> **Example:** If a scaledown happened from 5 pods to 4 pods at 11:00 the next scaledown from 4 pods to 3 pods happens at 11:05 if the cooldown period is 5 minutes |
### How to set tuning parameters for autoscaling?
The above mentioned tuning parameters can be configured by following the steps below
- For service resources airflow cluster change autoscale tuning configuration in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
- For data partition specific airflow cluster change autoscale tuning configuration in [helm-config-dp.yaml](charts/airflow/helm-config-dp.yaml) as mentioned below
```yaml
airflow:
<web|workers>:
autoscale:
minReplicas: <Numerical value>
maxReplicas: <Numerical value>
scaleDown:
coolDownPeriod: <Value in seconds>
```
### What is graceful termination concept in airflow workers?
Airflow worker process can gracefully terminate when it receives a stop signal as part of pod termination, as part of this step the following happens on airflow worker process
- Airflow worker does not accept any new tasks
- Airflow worker will wait for the running tasks to complete
- Once the running tasks are completed the airflow worker terminates successfully
Kubernetes waits for a certain amount of time for the airflow worker pod to gracefully terminate, if it does not complete the graceful termination in the specified amount of time kubernetes will terminate it forcefully.
### How to set graceful termination timeout for airflow workers?
To set the graceful termination timeout for airflow workers follow the steps below
- For service resources airflow cluster change graceful termination timeout in [helm-config.yaml](charts/airflow/helm-config.yaml) as mentioned below
- For data partition specific airflow cluster change graceful termination timeout in [helm-config-dp.yaml](charts/airflow/helm-config-dp.yaml) as mentioned below
```yaml
airflow:
workers:
celery:
gracefullTermination: true
gracefullTerminationPeriod: <Value in seconds>
```
......@@ -146,6 +146,14 @@ Data Ingestion is currently under development and due to initial OSDU community
Airflow Web authentication is now rbac enabled. To understand user roles, create new users and manage users please refer [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/airflow-rbac-guide.md).
## Where can I find the guidelines with respect to scaling airflow?
Please refer to [this](docs/airflow-scalability-guide.md) guide, it contains all information regarding airflow scalability.
## How do I enable autoscaling for airflow?
Please refer to [this](docs/airflow-autoscaling-guide.md) guide, it contains all information regarding airflow autoscaling.
## How to create a User in Entitlements V2
Users in Entitlements V2 can be created and managed using this Rest Client [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/rest/entitlement_manage.http)
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment