Draft: CosmosDB queries optimization [Not for review]
Type of change
-
Bug Fix -
Feature -
Pipeline -
Test -
Documentation
Does this introduce a change in the core logic?
-
No -
Yes
Does this introduce a change in the cloud provider implementation, if so which cloud?
-
AWS -
Anthos -
Azure -
GCP -
IBM
conventional commits spec?
Does this follow-
No -
Yes
Have you set the target Milestone?
-
No -
Yes
Have you set the no-detached-pipeline label?
-
No -
Yes
Updates description?
This PR contains changes of 2 CosmosDB queries:
- Query to fetch the subfolders
- Fetch all distinct paths instead
- Perform string transformation in the code
- Use equality filters instead of RegexMatch
- Query to fetch datasets
- Use equality filters instead of RegexMatch
- Please not this change is to present and test the approach, but it needs a proper, cleaner solution (currently it won't work for tenant containing '-')
The query execution performance improves significantly when these changes and custom indexes policy are applied on the CosmosDB container:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
},
{
"path": "/data/path/?"
}
],
"compositeIndexes": [
[
{
"path": "/data/tenant",
"order": "ascending"
},
{
"path": "/data/subproject",
"order": "ascending"
},
{
"path": "/data/path",
"order": "ascending"
}
]
]
}
Please note that this PR is for testing purposes, it presents the concept of a proposed approach but it's not a production code.
Edited by Izabela Kulakowska