feat: use optimised CosmosDB queries to retrieve datasets
Type of change
-
Bug Fix -
Feature -
Pipeline -
Test -
Documentation
Does this introduce a change in the core logic?
-
No -
Yes
Does this introduce a change in the cloud provider implementation, if so which cloud?
-
AWS -
Anthos -
Azure -
GCP -
IBM
conventional commits spec?
Does this follow-
No -
Yes
Have you set the target Milestone?
-
No -
Yes
Have you set the no-detached-pipeline label?
-
No -
Yes
Updates description?
This PR introduces an optimised queries to fetch datasets Azure provider. Changes:
- changing the dataset query to filter by tenant, subfolder and path instead of using regex max on the id
- introducing a new method
listDatasets
in IJournal interface - providing a default implementation for
listDatasets
in AbstractJournal class - small refactoring in the dao.ts that unifies the listDatasets methods implementations
- adding unit tests for TestAzureCosmosDbDAO to test
listDatasets
method - removing unused variables from the test class
Notes: I was thinking about 2 different versions for the new method signature:
-
listDatasets(tenant: string, subproject: string, path?: string, pagination?: PaginationModel)
and listDatasets(dataset: DatasetModel, pagination?: PaginationModel)
The second one would be more consistent with the listFolders
method but in the end I picked the first one to make clear which fields are used and which are mandatory in the dataset retrieval, so that future new implementations follow the contract and don't break the existing logic.
Edited by Izabela Kulakowska