SHEFFALI JAIN activity

SHEFFALI JAIN left project OSDU Software / OSDU Data Platform / System / Schema

2024-07-24T11:26:07Z

SHEFFALI JAIN left project OSDU Software / OSDU Data Platform / System / Storage

2024-07-24T11:26:07Z

SHEFFALI JAIN commented on issue #58 at OSDU Software / OSDU Data Platform / System / Lib / core / OS Core Common

2022-10-04T10:19:25Z

No, closing it. Thanks @chad

SHEFFALI JAIN closed issue #58: ADR : Implementing retries, CB and timeouts by default in services | RESILIENCY at OSDU Software / OSDU Data Platform / Syste...

2022-10-04T10:18:59Z

Decision Title

Adding retry, timeout and Circuit breaker to services to improve resiliency.

Status

Context & Scope

This is to implement resiliency in all the OSDU services. We want to implement the changes at core-common level for HttpClientHandler implementation.

For Azure, we have implemented resiliency features like retries, timeout and Circuit breaker in some of our services(Legal, Entitlement, Search) by adding new custom Http Client Handler(HttpClientHandlerAzure) in CORE_LIB_AZURE as here.

We want to add this resiliency feature in all our OSDU services which will require either change in each service at Core-lib-Azure or common change in core-common.

Trade-off Analysis

If we don't introduce new changes in the core-common then we will end up creating individual factory classes in Core-lib-Azure for each service with custom changes made to implement retries, timeout and CB.

Analysis	Core-Lib-Azure	Core-Commons
Pros	1. Turn around time for code check-in is faster	1. Resiliency will be implement for each service for every CSP by default when the feature flag is ON 2. Faster development for any new service for adding resiliency Consistency across CSP to enable resiliency
Cons	1. Implement resiliency at each service factory class 2. Any new service resiliency code has to be added explicitly	1. If resiliency flag is ON, we need to disable explicitly for a service

Advantages

In distributed systems, transient failures or latency in remote interactions are inevitable. Timeouts keep systems from hanging unreasonably long, retries can mask those failures, and backoff and jitter can improve utilization and reduce congestion on systems.

Rationale

To make our services resilient enough to anticipate unexpected events and account for them. In case of downtime also, we want to make sure there is enough time for pods to recover from an incident by not letting it bombarded with more number of requests.

Proposal

Based on the above trade off analysis, we are proposing changes in core-common. Having multiple services with similar functionalities and responsibilities is an additional overhead w.r.t maintenance.

To make changes of resiliency by introducing new HTTP client which will be consumed under feature flag. Default HttpClient will be same and CSPs can choose resilient HttpClient. Here's a code snippet that we are using to validate resiliency. But all of it will be under feature flag moving forward.

SHEFFALI JAIN removed due to membership expiration from project OSDU Software / OSDU Data Platform / Security and Compliance / Entitlements

2022-06-30T00:10:18Z

SHEFFALI JAIN opened merge request !336: reindex status code fix at OSDU Software / OSDU Data Platform / System / Indexer

2022-06-22T05:34:22Z

Currently re-index API sends out 200 response code no matter reindexing activity is performed or not. Now, sending out 500 status code if any of the kind is not reindexed. If successful in pitting messages in servcie bus then 200. Also reindex by kind will show the same behaviour.

Advancement : Saving all the response for each kind in hashmap. That can be used to send as response to client for better tracking of what kind is not reindexed.

SHEFFALI JAIN pushed new project branch indexerresponsefix at OSDU Software / OSDU Data Platform / System / Indexer

2022-06-22T05:30:46Z

SHEFFALI JAIN (0f40078a) at 22 Jun 05:30

reindex status code fix

SHEFFALI JAIN closed merge request !74: Adding rate limit filter at OSDU Software / OSDU Data Platform / Security and Compliance / Policy

2022-05-26T18:37:12Z

Type of change

Bug Fix
Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)

Does this introduce a change in the core logic?

Does this introduce a change in the cloud provider implementation, if so which cloud?

AWS
Azure
GCP
IBM

Does this introduce a breaking change?

Scope and Context

We are trying to provide design via which rate limiting can be applied to any service if enabled via flag and will be disabled by default. By setting a limit on how many requests a consumer is allowed to make in a given unit of time. We reject any requests above the limit with an appropriate response, like HTTP status 429 (Too Many Requests).

What is the current behavior?

Currently, no rate limiting is applied on service which can limit the count of users accessing it.

What is the new/expected behavior?

The service will have specific token count which will set limit to restrict number of users using the service per cycle.

Have you added/updated Unit Tests and Integration Tests?

No.

Any other useful information

Added Envoy filter to apply rate limiting. Added support to generate yaml file via Helm in deployment itself. Initially value of applying rate limit filter is set to false. It can be set while installing helm command using following instruction : --set envoyFilter.enabled=true

SHEFFALI JAIN deleted project branch indexerREsiliency at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-17T06:39:57Z

SHEFFALI JAIN (883e1fda) at 17 May 06:39

SHEFFALI JAIN accepted merge request !287: Adding retry cb to indexer at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-17T06:39:55Z

!247 !245

SHEFFALI JAIN pushed to project branch Azure/OSDU-Helm-Charts-Azure-M8-Master at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts...

2022-05-17T06:39:55Z

SHEFFALI JAIN (5451c0a4) at 17 May 06:39

Merge branch 'indexerREsiliency' into 'Azure/OSDU-Helm-Charts-Azure...

... and 1 more commit

SHEFFALI JAIN opened merge request !299: adding resiliency in Register services at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-char...

2022-05-11T10:19:37Z

Workflow used:

Create subscription(POST)
Fetch the subscription via ID returned in step 1.(GET)
Delete the subscription(DELETE)

Timeout:

Referring to below SS, at ~30rps, max time taken to respond was approx 1 minute. On the basis of discussion with SO and their previous experience it has been finalised as 180 secs.

Circuit Breaker:

Ejection timeout(Time taken by REgister pod to restart itself from failure ~2 mins) :

Rate limit:

Number of requests that does not lead to 503s.

35 requests per 5 seconds

50 requets per 5 seconds

75 requests per 5 seconds

p.s. Check infra related changes and prior discussions here : !247 For retries here : !245

SHEFFALI JAIN pushed new project branch registerResiliency at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-11T10:16:04Z

SHEFFALI JAIN (44a4236a) at 11 May 10:16

adding resiliency in Register services

... and 193 more commits

SHEFFALI JAIN deleted project branch m10fix at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-11T05:39:19Z

SHEFFALI JAIN (17e2269e) at 11 May 05:39

SHEFFALI JAIN opened merge request !298: search rate limit fix at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-11T05:36:38Z

SHEFFALI JAIN pushed new project branch m10fix at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-11T05:35:42Z

SHEFFALI JAIN (17e2269e) at 11 May 05:35

search rate limit fix

... and 199 more commits

SHEFFALI JAIN deleted project branch indexertimeoutfix at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-06T09:06:22Z

SHEFFALI JAIN (96355056) at 06 May 09:06

SHEFFALI JAIN accepted merge request !286: Deleting timeout for Indexer at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-05-06T09:06:20Z

SHEFFALI JAIN pushed to project branch Azure/OSDU-Helm-Charts-Azure-M8-Master at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts...

2022-05-06T09:06:19Z

SHEFFALI JAIN (f71a4372) at 06 May 09:06

Merge branch 'indexertimeoutfix' into 'Azure/OSDU-Helm-Charts-Azure...

... and 1 more commit

SHEFFALI JAIN commented on merge request !287 at OSDU Software / OSDU Data Platform / Deployment and Operations / helm-charts-azure

2022-04-28T19:24:27Z

For adding timeout to rest of the APIs of Indexer will be taken in next sprint.