Skip to content

Draft: CosmosDB queries optimization [Not for review]

Izabela Kulakowska requested to merge msft/spike/query-improvements into master

Type of change

  • Bug Fix
  • Feature
  • Pipeline
  • Test
  • Documentation

Does this introduce a change in the core logic?

  • No
  • Yes

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Anthos
  • Azure
  • GCP
  • IBM

Does this follow conventional commits spec?

  • No
  • Yes

Have you set the target Milestone?

  • No
  • Yes

Have you set the no-detached-pipeline label?

  • No
  • Yes

Updates description?

This PR contains changes of 2 CosmosDB queries:

  1. Query to fetch the subfolders
  • Fetch all distinct paths instead
  • Perform string transformation in the code
  • Use equality filters instead of RegexMatch
  1. Query to fetch datasets
  • Use equality filters instead of RegexMatch
  • Please not this change is to present and test the approach, but it needs a proper, cleaner solution (currently it won't work for tenant containing '-')

The query execution performance improves significantly when these changes and custom indexes policy are applied on the CosmosDB container:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
    {
        "path": "/*"
    }
    ],
    "excludedPaths": [
    {
        "path": "/\"_etag\"/?"
    },
    {
        "path": "/data/path/?"
    }
    ],
    "compositeIndexes": [
        [
            {
                "path": "/data/tenant",
                "order": "ascending"
            },
            {
                "path": "/data/subproject",
                "order": "ascending"
            },
            {
                "path": "/data/path",
                "order": "ascending"
            }
        ]
    ]
}

Please note that this PR is for testing purposes, it presents the concept of a proposed approach but it's not a production code.

Edited by Izabela Kulakowska

Merge request reports