Skip to content

Azure adding cosmos bulk operations

Jason requested to merge azure-adding-cosmos-bulk-operations into master

Type of change

  • Bug Fix
  • Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)

Does this introduce a change in the core logic?

  • [YES/NO]

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Azure
  • GCP
  • IBM

Does this introduce a breaking change?

  • [YES/NO]

What is the current behavior?

  • For any batch size, each record is queried in Cosmos individually to see if it exists
  • For any batch size, the records are uploaded one at a time to Cosmos
  • Old unused pipelines are in the service repo

What is the new/expected behavior?

  • An environment variable determines the minimum batch size to use the DocumentBulkExecutor to upload the records in parallel. If the batch size is >= this minimum, the records will be uploaded in parallel using bulk executor. If the batch size is less than the minimum, the records are uploaded in serial as they were before.
  • For all batch sizes, records are queried using a SQL query of the form SELECT * FROM c WHERE IN [list_of_ids_here] to reduce time to query for large batch sizes.
  • Old pipelines are deleted

Have you added/updated Unit Tests and Integration Tests?

Any other useful information

  • MR adding bulk executor to core-lib-azure can be found here.
Edited by Jason

Merge request reports