M7 Manifest based ingestion - Load Testing
Load Testing - The number of records or manifests that can be processes at a time.
Throughout M7 there have been a number of performance improvements delivered by EPAM, as well as work on improving issues with configuration etc. We expect this has made a significant improvement to the capacity of the manifest based ingestion, but we don't have a specific figure.
The process of load testing should be repeatable, with the expectation it will be applied to the upcoming Airflow 2+ changes.
We need test the "5000 manifest test" @debasisc @todaiks to be re-run on the M7 release. The result should a binary pass/fail and the wall-time for executing the job. For completeness (Table 1) we show a set of recommended test cases that we believe should ultimately be automated and runnable through the QA group.
|the "5000 manifest"||Our current baseline|
|1 Manifest with 5,000 records|
|1 Manifest with 20,000 records|
|1 Manifest with 50,000 records||Limit on the size of the request body|
|50K manifests in multiple requests, not simultaneously||Airflow 1.X doesn’t allow sending multiple requests (Fixed in Airflow 2.0)|
|chunks of 50, 1000 DAG runs||1. max_active_runs (50) limitation 2. limitation of workflow service: java heap error Issue 64 3. Storage Service has a limitation of storing no more than 500 records/s|
|chunks of 1000||see above|
|50 DAG runs|
|Launch several different DAGS simultaneously|
|Ingest the Volve data||to promote adoption|
|Ingest the TNO data||to promote adoption|