ADR: remove entitlements implicit quota of how many data groups can be created within a data partition

Status

Context

Entitlements service implements quota for the membership that one entity can have. The quota regulates the consumption behavior and protect the service. An entity could be a user, a service account or a entitlements group.

There is a bootstrap group called "users.data.root", and there is specific implementation to automatically add this group as children to all created data groups. The requirement behind this implementation is that any user or service account belong to the group "users.data.root" should have access to all data groups within the data partition. It also means that the user or service account have full permission to access all data within the data partition, since storage is using entitlements data groups as the record ACL. We will use the term "full data permission" below to refer to this requirement.

Quota itself is a good thing to have, however, because of the membership design of "users.data.root" group, it introduces an implicit quota which limits how many data group in total that could be created in one data partition.

Red is the current Authz; green is proposed:

Tradeoff Analysis

The advantage of adding "users.data.root" group to all data groups is that it hides the "full data permission" requirement implementation details from other services. Therefore, only the entitlements code needs to be changed originally to support this requirement and all other services who requires data authorization will automatically get this feature.

However, on the other hand, it introduces an implicit quota which limits how many data groups can created within the data partition. Such implicit quota constrains the group scalability and usability from the application. E.g. Let's assume the membership quota to be 5000, due to the implicit quota, it limits only 5000 data groups can be created within the data partition. When this quota met, applications can't create any new group, so it blocks the application functionalities. And when this quota met, the average membership of individual user or service account may still be a small number, so it does not fully utilize the service.

ⓘ Additional tradeoffs include that requiring each service to check for a group itself rather than relying on the entitlements service breaks

the non-functional agility requirement since any change to this dependency requires touching all services

transparency, since the entitlements service is no longer authoritative

the single-responsibility principle, since a service is now responsible for authorization in addition to its main function.

Decision

It is not a good idea to support "full data permission" by group hierarchy and ACL based entitlement. Such requirement should be implemented with role based or policy based entitlement. We'd like to propose a new design for this:

Entitlements service drops the implementation of adding "users.data.root" to all data groups. Therefore, it removes the undesired implicit quota of how many data groups can be created within a data partition.
Any downstream service which does data authorization with ACL checking should implement a new logic of checking whether the caller belongs to "users.data.root", if so, service should bypass the ACL checking and give the full data permission. Since all the services are using entitlements service to do API authorization, it already has the API request, no extra performance overhead will be added to the downstream services. Eventually, this logic should be converted to instance policy when the downstream services integrate with the policy service.

Consequences

All downstream services which do data authorization by ACL checking needs to be reviewed whether they need the code change to support "full data permission" requirement.

ⓘ This ADR grows the technical debt (growing the data authz logic in the search and storage service).

The Policy service could address many of tradeoffs described above as well as the technical debt by abstracting these checks out of the services; however

The Policy service is non-performant for as few as 50 groups, so many will use the hard-coded approach initially.

It is difficult to unwind implementation of a temporary solution. For instance, our understanding is that the Storage service bypasses the Policy service entirely and calls OPA directly (also for performance reasons).

For these reasons, Google Cloud suggests careful and persistent documentation of the technical debt which will need to be unwound in future.

As an agreement to balance between business feature development and technical debt, this ADR will add data manager authorization logic to both downstream services and OPA as a new policy. The reason we still need to add this hard coded logic to all downstream services is because policy is not 100% released yet, and the hard coded data authorization logic is still used in production. Since it will implement the new data manager policy in OPA, this ADR won't add any technical debt or logic discrepancy to policy service. The above technical debt can only be resolved when policy service releases to production and within its own tasks, it should remove all the hard coded data authz logic from all downstream services.

Identified impact services

Entitlements (MR: !477 (merged))

Storage (MR: osdu/platform/system/storage!694 (merged))

Search (has been already implemented the logic in the MR: osdu/platform/system/search-service!298 (merged))

Seismic DMS

To Be Added

Edited Jul 05, 2023 by Chad Leong