ADR: Normalized kind indexed field
Status
-
Proposed -
Approved -
Retired
Context & Scope
Schema id includes the semantic versioning and is indexed as "kind" in the OSDU indexer service. Indexer indexes each "kind" as a separate index in elastic search. Therefore, records from different schemas will have different "kind" and "index" in elastic search even for the same major version schemas. So far there is no direct attribute we can use from search to group (aggregateBy payload) the data by the schema major version. However, in the application, user may want to either group all major version of one data type or in some cases only care about the latest version of the same major version. We'd like to propose an approach to enable this for the OSDU applications.
Requirement
- The proposed solution should solve the index major version issue without significant performance degradation
- The proposed solution should be compatible with the existing business data that upstream OSDU applications stores
Approach 1
Elastic search allows to pass the script to create runtime field and then search or aggregate by such field. Since the indexed "kind" field already has all the information, but need to remove the minor and patch version from it. We could solve the problem from the OSDU search side to build the pre-defined runtime field script for user to consume.
The advantage of the approach is that we don't need to re-index the existing data. However, there is a cost that the server needs to run the script at runtime so there is performance degradation. We have done some load test to compare the aggregateBy on indexed field and runtime field. The performance degradation is pretty significant which is about 70% slower on median and 90%ile latency, so we pass this approach
Approach 2 (Proposed)
Take the performance into account, we have to physically indexed the new field. We are proposing to index this additional field under record tags field with a new sub attribute key "normalizedKind". The value of the "normalizedKind" will be derived from the original "kind" value by removing minor and patch version. E.g. if a mater-data--Wellbore record of kind "osdu:wks:master-data--Wellbore:1.1.0", such record will have a new field tags.normalizedKind with value "osdu:wks:master-data--Wellbore:1"
- Example of how to use the new field in search query
{
"query": "tags.normalizedKind:\"osdu:wks:master-data--Wellbore:1\""
}
- Example of how to use the new field in search aggregateBy
{
"aggregateBy": "tags.normalizedKind"
}
This approach requires re-indexing operation during deployment to take effect on existing data.