Skip to content

feat: use optimised CosmosDB queries to retrieve datasets

Type of change

  • Bug Fix
  • Feature
  • Pipeline
  • Test
  • Documentation

Does this introduce a change in the core logic?

  • No
  • Yes

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Anthos
  • Azure
  • GCP
  • IBM

Does this follow conventional commits spec?

  • No
  • Yes

Have you set the target Milestone?

  • No
  • Yes

Have you set the no-detached-pipeline label?

  • No
  • Yes

Updates description?

This PR introduces an optimised queries to fetch datasets Azure provider. Changes:

  • changing the dataset query to filter by tenant, subfolder and path instead of using regex max on the id
  • introducing a new method listDatasets in IJournal interface
  • providing a default implementation for listDatasets in AbstractJournal class
  • small refactoring in the dao.ts that unifies the listDatasets methods implementations
  • adding unit tests for TestAzureCosmosDbDAO to test listDatasets method
  • removing unused variables from the test class

Notes: I was thinking about 2 different versions for the new method signature:

  • listDatasets(tenant: string, subproject: string, path?: string, pagination?: PaginationModel) and
  • listDatasets(dataset: DatasetModel, pagination?: PaginationModel)

The second one would be more consistent with the listFolders method but in the end I picked the first one to make clear which fields are used and which are mandatory in the dataset retrieval, so that future new implementations follow the contract and don't break the existing logic.

Edited by Izabela Kulakowska

Merge request reports