Airflow Performance / Load testing
In conversations with Data Loading (Michaël, Ash, and others), we identified a need to develop an approach to determining performance requirements for the workflow service. Concerns have been raised based on implementation experience that AirFlow will not properly scale based on anticipated data loading demands.
I've expanded this issue to include representation from Data Loading, all 4 CSPs, and @Jane from EA. We should begin addressing this for M5 or shortly thereafter.
Initial discussions have identified two potential areas for improvement:
- Configure Airflow within the infrastructure as always-on vs. spin-up-on-demand. This approach increases cost but improves performance as it minimizes the delay in initiating a workflow.
- Introduce a throttling mechanism for workflow run requests to ensure Airflow is not overwhelmed to the point of failure with large numbers of request
There are likely other performance improvements to consider. We will update this description as those are discussed.