[Storage] Huge slowdown in Storage Stability and Transaction Size
The Storage service Record creation API in the latest version on Azure (post-AKS refactoring, deployed to client test environment but tested below in-house at SLB in dev) has two significant problems:
- Overall loading speed seems much slower than the original R2+ contributed version on GCP. Using identical Python loading code, the difference (larger sizes extrapolated) is about (per Thomas Dombrowsky):
Tentative timings on Azure OSDU EVQ and DELFI P4D
OSDU Azure (10,000 records) 7 min 40 sec (1M records) 12h 46min 40 sec
DELFI DM (10,000 records) 30 sec (1M records) 50 min
- The number of records that can be reliably included as the payload per call has gotten very small compared to the GCP version. Per Thomas D. again:
The real killer is that the Azure Storage API fails with large payload, so the number of records that can be ingested per API call are low. The failure is not because of a hard limit. With one record payloads, you get a random failure in 1% of calls (record is ingested just fine on the next attempt). With 20 record payloads, there are more failed requests than successful ones. With 100 records per payload, it takes dozens of attempts to get a single payload through.
By contrast, on DELFI GCP we do 100 record payloads and it succeeds every time.