Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • S Storage
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 56
    • Issues 56
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 18
    • Merge requests 18
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe Software
  • Platform
  • System
  • Storage
  • Merge requests
  • !124

Azure adding cosmos bulk operations

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Jason requested to merge azure-adding-cosmos-bulk-operations into master Jan 18, 2021
  • Overview 7
  • Commits 18
  • Pipelines 17
  • Changes 10

Type of change

  • Bug Fix
  • Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)
N/A

Does this introduce a change in the core logic?

  • [YES/NO]

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Azure
  • GCP
  • IBM

Does this introduce a breaking change?

  • [YES/NO]

What is the current behavior?

  • For any batch size, each record is queried in Cosmos individually to see if it exists
  • For any batch size, the records are uploaded one at a time to Cosmos
  • Old unused pipelines are in the service repo

What is the new/expected behavior?

  • An environment variable determines the minimum batch size to use the DocumentBulkExecutor to upload the records in parallel. If the batch size is >= this minimum, the records will be uploaded in parallel using bulk executor. If the batch size is less than the minimum, the records are uploaded in serial as they were before.
  • For all batch sizes, records are queried using a SQL query of the form SELECT * FROM c WHERE c.id IN [list_of_ids_here] to reduce time to query for large batch sizes.
  • Old pipelines are deleted

Have you added/updated Unit Tests and Integration Tests?

Any other useful information

  • MR adding bulk executor to core-lib-azure can be found here.
Edited Jan 19, 2021 by Jason
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: azure-adding-cosmos-bulk-operations