Azure adding cosmos bulk operations
Type of change
-
Bug Fix -
Feature
Please provide link to gitlab issue or ADR(Architecture Decision Record)
N/A
Does this introduce a change in the core logic?
- [YES/NO]
Does this introduce a change in the cloud provider implementation, if so which cloud?
-
AWS -
Azure -
GCP -
IBM
Does this introduce a breaking change?
- [YES/NO]
What is the current behavior?
- For any batch size, each record is queried in Cosmos individually to see if it exists
- For any batch size, the records are uploaded one at a time to Cosmos
- Old unused pipelines are in the service repo
What is the new/expected behavior?
- An environment variable determines the minimum batch size to use the DocumentBulkExecutor to upload the records in parallel. If the batch size is >= this minimum, the records will be uploaded in parallel using bulk executor. If the batch size is less than the minimum, the records are uploaded in serial as they were before.
- For all batch sizes, records are queried using a SQL query of the form
SELECT * FROM c WHERE c.id IN [list_of_ids_here]
to reduce time to query for large batch sizes. - Old pipelines are deleted
Have you added/updated Unit Tests and Integration Tests?
Any other useful information
- MR adding bulk executor to core-lib-azure can be found here.
Edited by Jason