ADR - Project & Workflow Services - Namespace-per-Kind & Kind-per-Namespace inventory features

This ADR focuses on Namespace-per-Kind & Kind-per-Namespace inventory features

Status

Context & Scope

Following on from this ADR that brought the idea that the same record instances could live in multiple namespaces or collaborations at the same time

There are two essential features absent in the context of collaboration functionality and needed for collaboration functionality completeness:

no feature to determine which namespaces have records of a given data kind. It is required on data kind reindex operations by the Indexer service because currently only SOR namespace records are reindexed. The upgraded reindex algorithm requires a list of involved namespaces for a given data kind, to reindex records in all of them.
no feature to determine which data kinds have records in a given namespace. It is required by the P&WS service for retrieving a whole WIP-resources data collection of a given collaboration project. It is required because OSDU instances have hundreds of data kinds and a project may have WIP in just a few of them, so scanning all data kinds and looking for records in a particular namespace is barely feasible (slow and expensive).

Decision

Of the five options proposed in the Tradeoff Analysis table below, three seem to be the most successful candidates:

#1 or #2, those driven by the Storage service. One of these variants would be ideal due to the closest disposition to the "source of truth", but they would require severe efforts for metadata storage upgrades by CSP teams;
so we promote #4, driven by the P&WS Service and using special data kinds' Storage records as a storage for the inventory register; despite its cons and concerns it has a good overall: no need for a new data storage and the localization on the P&WS service that logically hits its scope of responsibilities.

The following sequence diagram reflects the design of the winning solution and how the inventory works

The following two schemas should be developed to keep inventory registers data records:

work-product-component--KindsPerNamespace - to contain a list of data kinds that have records in a given namespace (the snippet is simplified):

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "data": {
            "type": "object",
            "properties": {
                "Namespace": {
                    "type": "string"
                },
                "Kinds": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    },
                    "minItems": 1
                }
            },
            "required": ["Namespace", "Kinds"]
        }
    },
    "required": ["data"]
}

work-product-component--NamespacesPerKind - to contain a list of namespaces that have records of a given kind (the snippet is simplified):

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "data": {
            "type": "object",
            "properties": {
                "Kind": {
                    "type": "string"
                },
                "Namespaces": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    },
                    "minItems": 1
                }
            },
            "required": ["Kind", "Namespaces"]
        }
    },
    "required": ["data"]
}

Rationale

Consequences

feature determine which namespaces have records of a given data kind will be implemented and enabled for consumption by the Indexer service in "reindex" tasks as depicted on the following diagram:

feature determine which data kinds have records in a given namespace will be implemented and enabled for consumption by the P&WS service in "get collaboration project WIP resources collection" tasks as depicted on the following diagram:

When to revisit

Tradeoff Analysis - Input to decision

Alternatives and implications

#	Responsible svc	Where it registers inventory data	When and How \ the inventory register update is triggered	Pros, Features	Cons, Concerns
1	Storage	records metadata store, same table, new field	when the svc changes a record in a NS \ it also updates the inventory for this NS	- the inventory register location looks native for the svc \ - Inventory update occurs instantly on any record change and it occurs synchronously in a single API call - no eventual consistency delays, as soon as a record change transaction committed, the inventory is in-sync	- not all CSPs’ metadata stores implementations are capable for optimal querying `distinct kinds per namespace` - need for a new API /namespaces/kinds - need for CSP teams to add/bootstrap a new field
2	Storage	records metadata store, separate table(s)	<the same as above>		- added costs for updating an additional register - need for a new API /namespaces/kinds - need for CSP teams to add/bootstrap a new table
3	Indexer \ (+Search)	ElasticSearch, \ separate index(s)	when the svc indexes a record in a NS \ it also updates the inventory for this NS	- whenever the record changes, it is always reindexed - inventory occurs asynchronously, thus not slowing down the process of updating records on the Storage svc - inventory information is also needed by the Indexer svc itself for reindexing operations; it’s good that it is located locally to it	- Indexer is not a source of truth \ - need for a new Search API /namespaces/kinds - added costs for updating an additional index - since indexing occurs with asynchronous delays relative to the moment the storage record is changed in the service, there is a delay before reaching eventual consistency - If for some reason the record_changed message is not processed by the Indexer, then the state of the inventory register will remain inconsistent
4	P&WS	SOR records of a special kind per-NS, per-kind	when the svc receives a `record_changed` event for a record in a NS it updates a SOR record for this NS	- no need to add and manage a new Cache/DB - this svc is one of the main ones interested in functionality related to NSs	- group A: \ -- need for a new API /namespaces/kinds -- not all custom namespaces are associated with collaboration projects -- since record_changed messages arrive asynchronously, there is a delay before reaching eventual consistency – collision-prone due to multiple concurrent updates of a single SOR record - group B: -- need to introduce a new OSDU schemas – these records themselves are subjects for indexing – misuse of the SOR catalog
5	P&WS	the svc own Cache/DB	<the same as above>	- a dedicated inventory register, no data mixing-up - easy to clean and rebuild	- <all from the group A from the above> \ - need for a new Cache/DB \ - need for CSP teams to add/bootstrap it

To support migration to add existing records to KindsPerNamespace and NamespacesPerKind itt will be necessary to provide an administrative level API function similar to the /reindex function(s) on the Indexer service.

Decision criteria and tradeoffs

the first criteria is ease of implementation and no need to add/upgrade storages to the system
the second criterion is the proximity of the solution to the "source of truth", which is the Storage service. This determines the reliability of the design and the low probability of loss of consistency

Decision timeline

Edited Jun 24, 2024 by Rostislav Dublin (EPAM)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information