feat: adding entitlements group caching and retries (!598) · Merge requests · OSDU / OSDU Data Platform / Security and Compliance / Policy

This MR introduces entitlements groups caching implementation and SPI for each provider, similar to the current object storage interface.

Under heavy search load, we've noticed that requests will occasionally fail because:

Every request to policy service sends a request to entitlements putting it under heavy load, causing delayed responses/failures
A non-OK result from entitlements causes the Search request to fail

This MR addresses both root causes by introducing Entitlements re-tries using the Python tenacity library to improve the fault-tolerance of Search/Policy requests and adding a cache to avoid making requests to entitlements when not needed.

Notes

The default cache is a VM cache using Python lib cachetools, which is implemented for all CSPs
An SPI is provided, similar to object storage, for each provider to implement their own caching solution
AWS has implemented a shared caching mechanism with Redis, similar to Java services
I used a similar cache key generation algorithm as Search to result in more cache hits (Search is hit first before Policy) using Authorization and partition id as headers
Most of this code should be moved to the OSDU Python SDK repo to be re-used by other Python services, but that can be done at a later date when we have another service ready for the same capability

Edited Nov 14, 2025 by Marc Burnie [AWS]

feat: adding entitlements group caching and retries

Notes

Merge request reports