Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • I Indexer
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 24
    • Issues 24
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 11
    • Merge requests 11
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • System
  • Indexer
  • Merge requests
  • !147

Enabling retry for indexer to storage service calls

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Muskan Srivastava requested to merge muskans-retry-indexer into master Apr 20, 2021
  • Overview 45
  • Commits 18
  • Pipelines 20
  • Changes 6

All Submissions:


  • I have added an explanation of what changes in this merge do and why we should include it?- Yes
  • Does the MR contain pipeline/ helm chart related changes? - NA
  • I have updated the documentation accordingly. -Yes
  • I have added tests to cover my changes. - Yes
  • All new and existing tests passed. -Yes
  • My code follows the code style of this project. -Yes
  • I ran lint checks locally prior to submission. -Yes

What is the issue or story related to the change?


This MR tries to solve this issue. This intermittent problem of records not being found is manifested by CosmosDb due to replication lag. The indexer service is affected by this issue whenever it(indexer service) makes http://storage/api/storage/v2/query/records:batch call. Hence, retry with maximum number of attempts as 3 and wait duration in between as 1 second is enabled only for the required call.

High level design: This document about CosmosDb mentions The read latency for all consistency levels is always guaranteed to be less than 10 milliseconds at the 99th percentile. The average read latency, at the 50th percentile, is typically 4 milliseconds or less." Therefore, max attempt of 3 and wait duration of 1 sec is thought to be sufficient.

Change details: Our interest area is when indexer service makes http://storage/api/storage/v2/query/records:batch call. This call is handled by UrlFetchService( for more details see here.). This MR adds 2 classes and their corresponding Tests classes: -

  1. RetryPolicy:- Handles the retry logic. The one point to pay attention is that in the above issue when records are not found even then the response status code is 200 and unfound records are listed under "notFound" jsonArray of the response body. Hence, the retry logic is handled accordingly. The retry functionality is enabled by using decorators provided by resilience4j over UrlFetchService(the one residing is core-common not over UrlFetchServiceAzureImpl which is newly added)
  2. UrlFetchServiceAzureImpl:- This class implements IUrlFetchService, the same interface implemented by UrlFetchService. This class simply calls UrlFetchService's methods of all the calls other than http://storage/api/storage/v2/query/records:batch. For this call, it calls the retryFunction of RetryPolicy. This is made primary bean so that this class gets injected instead of UrlFetchService.

The above methodology was used because it made zero changes in any of the existing code.

Test coverage:


test_case

Does this introduce a breaking change?


  • No

Other information


This code has also been tested using FireFly framework. Using perf tool, we have confirmation on true-negative i.e. not re-trying when it is not supposed to but not of true-positive as this issue is not easily reproduced. The issue was not reproduced therefore, the retry functionality was tested locally by making the storage service end-point to not find records randomly. Following things were confirmed from local testing:

  1. Retry was not enabled for any other end-point.
  2. Retry was enabled when "notFound" was not empty.
  3. Retry was not enabled for when "notFound" was empty.
  4. Retry happened only once or twice in case "notFound" was empty in second or third time. The retry functionality is thoroughly unit-tests covered.
Edited May 26, 2021 by Muskan Srivastava
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: muskans-retry-indexer