WIP - Performance Benchmarking

The purpose of this WIP issue is plan around how we are going to benchmark ingestion performance. Need to address the performance of ingestion mechanisms. Expected performance in production env is in excess of 33k records per minute - Wells, well logs, trajectories etc...

@Devendra_R @npickus to connect with @todaiks & @chad if possible to confirm testing approach, timing and feedback cycles Data examples from real use cases include wells, wellbores, trajectories, etc.. (already using tno and volve)
Nick to check with Chevron teams to see if there is an opportunity for the teams to schedule a test of the current manifest ingestion in a real production environment to compare to current test rates within the Forum. Team can use Script from Jean Rainauld and test with the same synthetic data as the forum tests. testing info
@debasisc to follow up with CSPs to gain alignment on CSPs testing ingestion in their environments

*Issues

No defined custodians & developers for Manifest ingestion
data sets used for testing not representative of real data - only master data
testing requires close coordination with CSPs -

Load Testing & Performance

performance changes since M6 load testing Issue

*Old test results
M7, M8

Basic Load testing

Testing the ingestion of 500, 1000, 50,000 record manifests.

Uses synthetic manifests to perform basic testing of the ingestion. Load testing is run by pre shipping for each release, one release in arreas, which means MX is testing during the development cycle of M(X+1). A spreadsheet showing the pass/fail and timing per CSP is provided by pre shipping on conclusion of the test.

Using the Airflow console, additional information regarding run time and latency of the Airflow scheduler can be found using the Airflow console Gantt Chart. This data provides a view of where the performance bottlenecks might be.

Assets:

Basic load testing passed True/False (per release)
Timing information (scaling as a function of manifest size).
Snap shots of the Airflow Gantt Chart

Advanced Load testing

The the other items (beyond the basic) load testing give insights into the sensitivity of the performance on the Airflow configuration. This maybe carried out if resources are available.

Defining standards @npickus

Today teams are loading 8-10 Million records, including validations, outside of the Manifest or CSV ingestion mechanisms in ~5 hours for a rate of about 33k Records per minute.

Collect x2 user stories from operators.
Define OSDU EA/community expectations.

For applications developers (local mode) @epeysson

There is a use case for applications developers to run the Airflow part of the ingestion locally (with core services provided accessible through a REST API). This local mode can be used to better profile the performance of just the Airflow component, agnostic.

Standalone installation instructions are here.

Complete basic load testing with standalone Airflow.

Edited May 04, 2022 by Devendra Rawat