Indexer issueshttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues2024-03-08T16:00:02Zhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/156Augmented Index - parent-child use case - trailing colon (":") in relationshi...2024-03-08T16:00:02ZDebasis ChatterjeeAugmented Index - parent-child use case - trailing colon (":") in relationship field in child recordSee my test case in AWS/M22/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/Test_plan_Results_M22/Core%20Services/M22-AWS-Augmented-Index-parent-child-steps-Debasis.docx
This test case involves "We...See my test case in AWS/M22/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/Test_plan_Results_M22/Core%20Services/M22-AWS-Augmented-Index-parent-child-steps-Debasis.docx
This test case involves "Well" (parent) and "Wellbore" (child).
When troubleshooting with @zhibinmai , we tried by removing trailing colon from relationship field of child record Wellbore.
With that change, I could get clean run and "virtual field" appears in search response as expected.
In child Wellbore record -
"WellID": "osdu:master-data--Well:WELL07MARDC"
whereas it should be
"WellID": "osdu:master-data--Well:WELL07MARDC:"
(this is the convention when referencing field from another entity - like "foreign key" relationship)
I earlier ran similar test case in Azure and did not see the impact of trailing colon there.
@ydzeng - how to ensure feature parity between version of code (for augmented index) in M22/AWS/Preship with other CSPs?
cc @chad and @sjtomlinsonhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/153Indexer is not supporting 64-bit integer value2024-03-23T16:00:07ZAn NgoIndexer is not supporting 64-bit integer valueA bug was submitted for case when a seismic volume size did not get indexed. The [AbstractDataset ](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractDataset.1.0.0.json?ref_type=heads#L26...A bug was submitted for case when a seismic volume size did not get indexed. The [AbstractDataset ](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractDataset.1.0.0.json?ref_type=heads#L26)definition (and a few more [attributes](https://community.opengroup.org/search?group_id=218&nav_source=navbar&project_id=91&repository_ref=master&scope=blobs&search=convertible+to+a+long+integer+extension:json+path:Authoring&search_code=true)) states that the value must be converted to a long integer. But it seems the Indexer only handles 32-bit integer values.
Proposal to fix:
* declare that the default schema definition for "int" is a 64-bit value, which will increase storage and processing and require a re-index on potentially all ingested data.
* create a new schema type "long int" that supports 64-bit value, update the existing schema definition for just the attributes that may exceed 32-bit size, and re-index the affected data.
Screen shots of error and data value:
![storage.png](/uploads/0d9a6215309c8164f647bfd7d657bcf8/storage.png)
![indexing_error.png](/uploads/5f8752f09cbb2e5c9c93595805a87600/indexing_error.png)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/150records with id longer then 512 bytes cannot be indexed2024-02-19T18:16:31ZNeelesh Thakurrecords with id longer then 512 bytes cannot be indexedAny Storage record with id longer then 512 bytes is not searchable. Search/Indexer backend (Elasticsearch) throws error when id is longer the 512 bytes. Following message is logged by Indexer service:
indexer.app Validation Failed: 1: i...Any Storage record with id longer then 512 bytes is not searchable. Search/Indexer backend (Elasticsearch) throws error when id is longer the 512 bytes. Following message is logged by Indexer service:
indexer.app Validation Failed: 1: id [xxx-yyy-zzzzzzz-corporation:work-product-component--WellLog:xxx-yyy-zzzzzzz-corporation:work-product-component--WellLog:CRXdAWD15bqEJ8kNtJe6V3RKXmSQzmohsYZDhe7QdR58iFGHOA0b5Otuc96XDgp34TNCk851FsKB95zHx7QazeBIG0NxT3CVDLyWpEe0nyXGgDMY2k1RR1SXzum4IqMajpscNM6kVjRlBjh2Cx2ZGDt7RW0AKYEemm8IpU1kvWRgjYATXJacoDivlQJqJ07Ghzco4MOu2TYFDq31qfnpVP37E2pktUGvHug1qQoVSHaSoT4zQgOiOF1WXMfZWTPIlaRdnaUSbjN2aXgH9zMlSebOkJ4J0SAU9lMs58QJsSvMoL9bjaBmniVNq2os41oyL3gZrBucz2yI67Yzm72y72fb7swlBoiONveLgyTra2fY8q9btfxjGYDPO71dwA1akgNmJerCDdE]
is too long, must be no longer than 512 bytes but was: 517
This breaks end user discovery workflows. Core services must resolve this issue so all ingested records are discoverable by Search service.M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/149Please check list of required roles for Reindexing task2024-02-12T06:15:40ZDebasis ChatterjeePlease check list of required roles for Reindexing taskhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#indexer-api-access
Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Us...https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#indexer-api-access
Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Users must be a member of `users.datalake.viewers` or `users.datalake.editors` or `users.datalake.admins` or `users.datalake.ops`,
"or" indicates that reindexing should work well if the user is assigned one of the above roles.
I am experiencing failure although the user is a member of the groups -
**users.datalake.viewers**
and
**users.datalake.editors**
Also can you see this role?
service.indexer.admin
Is this a requirement?
POST {{INDEXER_HOST}}/reindex?force_clean=true
Body
```
{
"kind": "osdu:wks:master-data--Play:1.0.0"
}
```
Response
```
{
"code": 401,
"reason": "Unauthorized",
"message": "The user is not authorized to perform this action"
}
```
cc @chad for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/148Augmented Index - Provide new endpoint, or add info inside existing endpoint ...2024-02-09T01:01:51ZDebasis ChatterjeeAugmented Index - Provide new endpoint, or add info inside existing endpoint - is the Feature flag enabled or disabled?We need simple way for end users to know if the related "feature flag" is set on in the instance or not.
Today, there are similar examples from "Version info" endpoint and Status/Health endpoints of some DDMS's.
A.
Example of “info” end...We need simple way for end users to know if the related "feature flag" is set on in the instance or not.
Today, there are similar examples from "Version info" endpoint and Status/Health endpoints of some DDMS's.
A.
Example of “info” endpoint.
GET {{INDEXER_HOST}}/info
```
{
"groupId": "org.opengroup.osdu.indexer",
"artifactId": "indexer-aws",
"version": "0.25.1",
"buildTime": "2024-01-24T20:28:16.495Z",
"branch": "refs/heads/release/r3-m22",
"commitId": "07ad22b2308a75e018a0f9f72c579afb66f7928a",
"commitMessage": "merge from gitlab tag",
"connectedOuterServices": [
{
"name": "ElasticSearch-osdu",
"version": "7.17.15"
},
{
"name": "ElasticSearch-common",
"version": "7.17.15"
}
]
}
```
B.
Some cases, we also see SOH (State of health) style option.
Seismic DDMS V4 API
GET {{osduonaws_base_url}}/seistore-svc/api/v4/status/readiness
```
{
"ready": true
}
```
GET {{osduonaws_base_url}}/seistore-svc/api/v4/status
```
{
"status": "running"
}
```
cc @zhibinmaihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/147Augmented Index - want to provide "id" in ValueExtraction.ValuePath"2024-02-05T05:53:32ZDebasis ChatterjeeAugmented Index - want to provide "id" in ValueExtraction.ValuePath"For reference-data--IndexPropertyPathConfiguration -
Currently, it supports providing fields only from within "data" block only.
In my use case (Wellbore->WellLog : Parent->Child. Hence "RelationshipDirection": "ParentToChildren"),
we ...For reference-data--IndexPropertyPathConfiguration -
Currently, it supports providing fields only from within "data" block only.
In my use case (Wellbore->WellLog : Parent->Child. Hence "RelationshipDirection": "ParentToChildren"),
we want to create a derived field (augmented index) of all child record IDs.
So, if 4 WellLog records are linked to one parent Wellbore, then the derived field should show a value of
[WellLog-record-ID1, WellLog-record-ID2, WellLog-record-ID3, WellLog-record-ID4]
Let me know if there is any question about this use case.
Thank you
cc @zhibinmaihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/141Indexing issue with GeoJSON structure in Spatial block2024-01-22T15:55:55ZJeyakumar DevarajuluIndexing issue with GeoJSON structure in Spatial blockeds_ingest dag use a search service to get records from the source system(OSDU/NON OSDU) and ingest them into a target OSDU system using osdu_ingest.
Search service provides the flattened "SpatialLocation.Wgs84Coordinates" and it gets i...eds_ingest dag use a search service to get records from the source system(OSDU/NON OSDU) and ingest them into a target OSDU system using osdu_ingest.
Search service provides the flattened "SpatialLocation.Wgs84Coordinates" and it gets ingested using osdu_ingest, but there is an indexing issue.
Troubleshooting reveals that Spatial data is not indexed since GeoJSON syntax is incorrect.
**"geo-json shape parsing error: must be a valid FeatureCollection attribute: SpatialLocation.Wgs84Coordinates",**
Looks like string fields are displayed even though it is flattened, but not array-like Wgs84Coordinates??. There may be a problem with indexing in handling array-like structure?
There are other flattened fields displayed in the search query,
![Search.png](/uploads/ffc79525dc9b51a2a93649a26303dc31/Search.png)
Storage shows the Wgs84Coordinates
![Storage.png](/uploads/f8c1fb09b468de6a4fe33400fe57999b/Storage.png)
CC: @debasisc @AshishSaxenaAccenture @chad
Sample Seismic Trace data
[SeismicTraceData](/uploads/4b69869d4c94d32e6f044b6c320c8b30/SeismicTraceData)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/140Indexing Records of same Kind but different case for "kind" results in undesi...2024-01-22T15:54:41ZSabarish K R EIndexing Records of same Kind but different case for "kind" results in undesired behaviour.Assume a new schema is created with Kind: `osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0`
Next, a storage record is PUT, with the value for kind as "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Notice the Uppe...Assume a new schema is created with Kind: `osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0`
Next, a storage record is PUT, with the value for kind as "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Notice the Upper case OSDU.
When indexing, the following happens:
- Indexer service does a GET request to schema for the kind "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Schema is NOT found. (due to upper case)
- Indexer service still goes ahead, as per design, creates an elasticsearch index "**osdu-wks-reference-data--velocityanalysismethodx1-1.0.0**", and this index's mapping, for the field **authority**, the allowed value is set as constant "OSDU" (in uppercase, as derived from storage record's "kind" field). Then the record is indexed in this index, with ``` "trace": ["schema not found"],``` as the reason for the data fields not being indexed.
Now this causes two major issues:
- Legitimate records with kind **osdu**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0 will not get indexed, because the elasticsearch index for this kind is already created with "authority" field allowed to have only the constant value **OSDU**, (hence cant accept **osdu**).
- The mapping also would have got created with no details about data fields due to schema not found the first time (as discussed earlier). This will cause the data fields of the storage record to NOT get indexed. and hence, these records won't be searchable.
This happens because elasticsearch index is created by converting kind string to lowercase, so two records with logically the same kinds, but different CASE, will have these conflicts during indexing.
(index name = lowercase(kind), and replace : with - )
To solve this, we need to design a strategy to handle different casing of the meta attributes like "kind"/"authority" appropriately.M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/122IndexMapping not updated with AsIngestedCoordinates fields2024-01-15T11:55:41ZKonrad KrasnodebskiIndexMapping not updated with AsIngestedCoordinates fieldsCurrent AsIngestedCoordinates feature implementation updates Index mapping regarding what AsIngestedCoordinates fields occur in record. Index mapping mechanism have caching functionality which check whether mapping was synced. This mecha...Current AsIngestedCoordinates feature implementation updates Index mapping regarding what AsIngestedCoordinates fields occur in record. Index mapping mechanism have caching functionality which check whether mapping was synced. This mechanism could disrupt mapping update for various records with the same kind.
Related MR [!650 (merged)](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/121Review IndexerMappingServiceImpl2024-01-08T12:04:59ZMark ChanceReview IndexerMappingServiceImplReview comments from MR https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650Review comments from MR https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/119ADR: Delete API endpoint to delete index for all kinds.2024-01-25T13:21:20ZAkshat JoshiADR: Delete API endpoint to delete index for all kinds.## Status
- [X] Proposed
- [] Under review
- [] Approved
- [] Retired
## Context & Scope
The ADR is centered around the adding the capability of performing the deletion of elastic search index for all kinds per call in existing Delete i...## Status
- [X] Proposed
- [] Under review
- [] Approved
- [] Retired
## Context & Scope
The ADR is centered around the adding the capability of performing the deletion of elastic search index for all kinds per call in existing Delete index API in indexer service.
## Decision
Currently the delete API introduce as the part of this ADR -https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/54 supports only the deletion of the single index for given kind. As part of Replay Design,ADR - https://community.opengroup.org/osdu/platform/system/storage/-/issues/186 , user will may require deleting all the indices in use case of the reindex instead of overwriting the indices. As mentioned in this flow - <br>
![replayAll](/uploads/70dd44c84d985e56148ac84930fa3bd9/replayAll.png) <br><br>
## API Details <br>
**API Level Permission** - users.datalake.ops <br>
**Service** – Indexer
<b>delete API in indexer service</b>.
Sample request:
```bash
curl --request DELETE \
--url '/api/indexer/v2/index' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes'
```
<br><br>
**Current Scenario vs New Scenario of Delete Index API in Indexer Service**
<table>
<tr>
<td><strong> </strong>
</td>
<td><strong> Existing Scenario</strong>
</td>
<td><strong> New Scenario</strong>
</td>
</tr>
<tr>
<td><strong> API Method</strong>
</td>
<td> Delete
</td>
<td> Delete
</td>
</tr>
<tr>
<td><strong>Endpoint supported</strong>
</td>
<td><strong>indexer/v2/index? kind=”tenant1:public:well:1.0.2“</strong>
<p>
</td>
<td>- <strong> indexer/v2/index? kind=” tenant1:public:well:1.0.2“</strong> -it will delete single kind
<p>
- <strong>indexer/v2/index</strong> – It will delete all kinds. (new endpoint)
</td>
</tr>
<tr>
<td><strong>Backward Compatible</strong>
</td>
<td>NA
</td>
<td>Yes
</td>
</tr>
<tr>
<td><strong>New Functionality</strong>
</td>
<td>NA
</td>
<td>It will allow you to delete all the indices.
</td>
</tr>
<tr>
<td><strong>API level change</strong>
</td>
<td>Currently kind should be non-blank parameter
</td>
<td> Will remove nonblank parameter from kind.
</td>
</tr>
<tr>
<td><strong>Code change required?</strong>
</td>
<td>NA
</td>
<td>Yes, backend code change is required to support the deletion of all kinds of indices.
</td>
</tr>
<tr>
<td><strong>API Response </strong>
</td>
<td> Same
</td>
<td> Same
</td>
</tr>
</table>
## Consequences
- This will provide user with the capability to delete index for all kinds.Akshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/117Indexing error stays in os-indexer log file and is not visible for end users2023-11-27T13:00:03ZDebasis ChatterjeeIndexing error stays in os-indexer log file and is not visible for end usersWe had a problem with content of spatial block in SeismicAcquisitionSurvey and the record was not getting indexed properly (missed Spatial information)
.
We could find the reason only from **os-indexer log file**.
But the usual troubles...We had a problem with content of spatial block in SeismicAcquisitionSurvey and the record was not getting indexed properly (missed Spatial information)
.
We could find the reason only from **os-indexer log file**.
But the usual troubleshooting steps did not show any of the errors.
Tried using Storage service (record batch) and also Search service (returned Field ID, Index).
Do we need to use any other parameter? I tried frame-of-reference with value of
units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;
After we found out the reason (need to close the polygon), we could fix the record and re-ingest. Then indexing went through properly.
The problem is that average end user does not have access to Indexer log.
cc: @zhibinmai for information
2023-11-08 21:48:32.386 WARN 19 --- [io-8080-exec-35] o.o.o.c.common.logging.DefaultLogWriter : indexer.app: 0: elasticsearch bulk service status: BAD_REQUEST | id: devel:master-data--SeismicAcquisitionSurvey:ST12005D12 | message: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [data.SpatialLocation.Wgs84Coordinates] of type [geo_shape]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=[1:1037] [geojson] failed to parse field [geometries]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Failed to build [geojson] after last required field arrived]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=first and last points of the linear ring must be the same (**it must close itself**): x[0]=2.3651712345204428 x[15]=1.929596222841967 y[0]=61.232218587912584 y[15]=61.23097891915874]];
1: elasticsearch bulk service status: BAD_REQUEST | id: devel:master-data--SeismicAcquisitionSurvey:ST12005D17 | message: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [data.SpatialLocation.Wgs84Coordinates] of type [geo_shape]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=[1:1037] [geojson] failed to parse field [geometries]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Failed to build [geojson] after last required field arrived]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=first and last points of the linear ring must be the same (it must close itself): x[0]=2.3651712345204428 x[15]=1.929596222841967 y[0]=61.232218587912584 y[15]=61.23097891915874]];
{correlation-id=b791d3fa-49ae-4eb1-b8c1-9f498451e06a, data-partition-id=devel}https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/116/reindex/records API returns 500 if all given records are found2023-10-05T10:17:30ZMingyang Zhu/reindex/records API returns 500 if all given records are found/reindex/records was introduced by the ADR: https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/90.
The code has a bug of shallow copy, which causes an exception if there is no not-found records and therefore r.../reindex/records was introduced by the ADR: https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/90.
The code has a bug of shallow copy, which causes an exception if there is no not-found records and therefore return 500.Mingyang ZhuMingyang Zhuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/115normalizedKind tag is not indexed if the storage record already had tags2023-09-26T09:53:18ZMingyang ZhunormalizedKind tag is not indexed if the storage record already had tagsnormalizedKind tag is not indexed if the storage record already had tagsnormalizedKind tag is not indexed if the storage record already had tagsMingyang ZhuMingyang Zhuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/113ADR: Bag of Words2024-03-18T14:07:18ZMark ChanceADR: Bag of Words# ADR: Copy all text field to BagOfWords field
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Background
The application development stakeholders want to provid...# ADR: Copy all text field to BagOfWords field
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Background
The application development stakeholders want to provide their users a mechanism to search for words in a record regardless of where it appears in the record. Currently this is not working for nested fields as inner mechanism is relying on `query_string` ES query which is not allowing searching through nested documents.
# Context & Scope
[Back to TOC](#TOC)
## Requirements
- User is able to find resources by words stored in any field using query without using explicit field names.
- User is able to find resources referencing given ID from external systems if this ID is part of referencing OSDU ID.
- (Additional) List of all phrases is stored inside single field to be able to implement simple autocompletion.
[Back to TOC](#TOC)
# Tradeoff Analysis
## Option 1
All the fields are copied and to the word-bag using copy_to mechanism. We are proposing `bagOfWords` as the internal field name for this use case. This enables the user to find wells through their alias names using fulltext query (name aliases are stored in the nested array, so currently it is not possible without explicitly specifying field name).Additionally, to `bagOfWords` we would like to add ID detail as they are often IDs from external source systems like (“osdu:wks::master-data—Well-1.0.0:43234324” detail here may contain UWI). So, when the users know 4323424 (for example from the source system) but don't know OSDU internal ID system, they are still able to find records referencing them (for example find all DS related to given wellbore). Such a field is also valuable for implementing search-as-you-type autocompletion, we can create simple but powerful version of it by just adding a subfield with ES completion indexing and expose it for searching.
## Option 2
If for some reason alternative 1 is too broad, it is suggested to use the indexing hints added to the schema files as described here: https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66. A tag such as x-osdu-indexing-copytowordbag could be an indicator that the associated field is to be added to the workbag field:
“x-osdu-indexing-copytowordbag”: “enabled”/"disabled"
for example. However such approach would make schemas less portable as every OSDU installation may have different needs.
[Back to TOC](#TOC)
# Proposed solution
For each kind of resource, an index will be created and the value will contain all (normalized) tokens across all other text fields in the mapping.
This will enable a query of the form:
```json
{
"kind": "osdu:*:*:*",
"query": "test"
}
```
which would return
```json
{
"results": [
{
"data": {
"FacilityName": "Example test"
},
"id": "osdu:master-data--Well:1012"
},
{
"data": {
"FacilityNameAlias": "Example test"
},
"id": "osdu:master-data--Well:30142"
}
]
}
```
The search service query against the word_bag field so that the two wells would be returned despite 'test' occurring in different fields.
[Back to TOC](#TOC)
## Accepted Limitations / things to work out
[Back to TOC](#TOC)
# Change Management
* Operators may need to execute reindex with force_clean=true action on indices to enable this feature.
# Decision
# Consequences
* The indexer code changes should have no impact on automated applications as they are using field related queries which are unchanged. Application where user is controlling top level query might show new additional results (for matches in nested objects and in ID details), but this is expected behavior.
[Back to TOC](#TOC)
#EOF.M22 - Release 0.25Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/111ADR: Search backend (Elasticsearch) Upgrade2024-02-05T14:31:07ZNeelesh ThakurADR: Search backend (Elasticsearch) Upgrade## Status
- [X] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Background
Elasticsearch serves as the backend for indexer and search services. To communicate with the Elasticsearch server (deployed and managed independent...## Status
- [X] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Background
Elasticsearch serves as the backend for indexer and search services. To communicate with the Elasticsearch server (deployed and managed independently), these services use the Elasticsearch [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) client SDK. Official OSDU Data Platform supported Elasticsearch version for server & client SDK is [v7.8.1](https://www.elastic.co/blog/elastic-stack-7-8-1-released). Current version is quite old, and already beyond [end of life](https://www.elastic.co/support/eol) support. A new major version (v.8.x), which was released in April 2022, is also available. Furthermore, not only will an upgrade to Elasticsearch resolve issues & offers new features and capabilities, but it will also save costs. Here are few reasons, Elasticsearch client & server should be updated:
- [Log4J vulnerability](https://blog.qualys.com/vulnerabilities-threat-research/2021/12/10/apache-log4j2-zero-day-exploited-in-the-wild-log4shell) discovered on December 2021 forced all CSPs to update their Elasticsearch server version. At this time, all CSPs are on different server versions e.g. Azure v7.17.x, IBM v7.11.x etc. Even though Elasticsearch promises on not introducing any breaking change on a major version, we have found issues in past. Ideally all CSPs should be on same client & server versions to avoid any potential issues.
Community effort on [Reference Implementation](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/architecture-decision-records/-/blob/main/0006-osdu-will-have-a-reference-implementation.md?ref_type=heads) gives us a good opportunity to upgrade and align Elasticsearch client and server version.
- Elasticsearch v7.8.1 has reached end of life some time back. [Officially supported](https://www.elastic.co/support/eol) version for Elasticsearch v7 is v7.17.x or higher. If an issue found with client SDK or server, than fix is usually avaialable in most recent version.
- Elasticsearch has launched many new versions past v7.8.1 with several improvements & new features, some notable ones in v8.x are mentioned below:
- Elasticsearch v8.3.x has [removed](https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#shard-count-recommendation) 1k shard count per node limitations. OSDU DD Definition Team has introduced several new schemas over the course of few milestone releases. On Elasticsearch, each schema index generates two shards. Currently, a single node in an elasticsearch instance can only hold up to 1K shards. A small or medium-sized Elasticsearch cluster can quickly run out of shard capacity with so many new schemas.
- Reduced resource requirements via [memory heap reductions](https://www.elastic.co/blog/significantly-decrease-your-elasticsearch-heap-memory-usage). This can result in lowered customers’ total cost of ownership. Added support for the [ARM architecture](https://www.elastic.co/blog/whats-new-elasticsearch-7-12-0-put-a-search-box-on-s3), it offers 20% better performance while being 10% cheaper than x86-64. Introduced novel ways to use less storage by decoupling compute from storage with a new [frozen tier and searchable snapshots](https://www.elastic.co/blog/whats-new-elasticsearch-7-10-0-searchable-snapshots-store-more-for-less).
- Improved indexing latency of several data types including [geo-points, geo-shapes](https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-highlights.html#_faster_indexing_of_geo_point_geo_shape_and_range_fields) etc. [Enhanced error messages](https://issues.apache.org/jira/browse/LUCENE-9538) on invalid geo-shape indexing. It can now provide more meaningful messages capturing issues with shape, rather a generic messages in current version. Several new geo queries (e.g. [geo-grid query](https://www.elastic.co/guide/en/elasticsearch/reference/8.3/release-highlights.html#new_geo_grid_query) etc.), aggregations (e.g. [cartesian-centroid](https://www.elastic.co/guide/en/elasticsearch/reference/8.6/release-highlights.html#support_cartesian_centroid_aggregation_over_points_shapes), [geo-hex](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html#geohex_aggregations_on_both_geo_point_geo_shape_fields) aggregation over points and shapes etc.) are also introduced.
- Introduced a new [health API](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html#health_api_generally_available) designed to report the health of the cluster. The new API offers a detailed report that can include a precise diagnosis and a solution, as well as a high level overview of the cluster health. The operational teams can benefit greatly from this API.
- Released a full suite of native [vector search](https://www.elastic.co/what-is/vector-search) via [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-highlights.html#_new_knn_search_api). It adds support for natural language processing (NLP) models directly into Elasticsearch. Users can now perform named entity recognition, sentiment analysis, text classification, and more directly in Elasticsearch — without requiring additional components or coding. Elasticsearch v8.x also includes native support for [approximate nearest neighbor (ANN)](http://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0) search — making it possible to compare vector-based queries with a vector-based document corpus with speed and at scale.
## Proposal
Any Elasticsearch upgrade will require coordination with community and CSPs. This can be very time consuming. Instead of just upgrading Elasticsearch to latest v7.17.x, we should upgrade to v8.10.0 (or the highest released v8.x) to minimize the disruption and repeat this step very soon. Since the last major version of Elasticsearch (v8) was released 18 months ago, once v9 is released, the entire v7 (v7.17.x) family will be deprecated, as stated in the [support documentation](https://www.elastic.co/support/eol).
We should breakdown upgrade into two parts:
#### Latest v7.17.x Upgrade
1. Take back up (snapshot) of the data. We cannot roll back to an earlier version unless we have snapshot.
1. Upgrade Elasticsearch server to latest v7.17.13 (or highest available v7.17.x).
1. Replace Indexer & Search services [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) client SDK with new [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html). Elasticsearch has [changed](https://www.elastic.co/pricing/faq/licensing) **Java high level rest API** client SDK's license in v7.10.2 from Apache 2.0 to [SSPL](https://www.mongodb.com/licensing/server-side-public-license). New license is not preffered license for OSDU Data Platform as explained in the [issue](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/133).
[Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) with [Apache 2.0](https://github.com/elastic/elasticsearch-java/) license is available v7.15.0 onwards (including v.8.x). Along similar timeline [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) has been [deprecated](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) in favor of [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html).
#### Latest v8.10.x Upgrade
1. Complete client and server upgrade to latest v7.17.x as stated above.
1. Use the Kibana [Upgrade Assistant](https://www.elastic.co/guide/en/kibana/7.17/upgrade-assistant.html) to prepare for upgrade from v7.17 to v8.10.0. The Upgrade Assistant identifies deprecated settings and guides users through resolving issues.
1. Review the deprecation logs from the Upgrade Assistant.
1. Review breaking changes including breaking changes for each minor v8.x release up to v8.10.0.
1. Make the recommended changes to ensure that applications/APIs continue to operate as expected after the upgrade.
1. Take a current snapshot before server upgrade is started.
1. Upgrade Elasticsearch server to latest v8.10.0 (or highest available v8.x).
1. Upgrade Indexer and Search service [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) to v8.10.0 (or highest available v8.x).
Elasticsearch upgrade recommendations from v7.x to v8.x can be found [here](https://www.elastic.co/guide/en/elastic-stack/8.10/upgrading-elastic-stack.html#prepare-to-upgrade).https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/92M18 DELETE /api/indexer/v2/index cleanup-indicies-api2023-06-15T14:17:59ZShane HutchinsM18 DELETE /api/indexer/v2/index cleanup-indicies-apiReceived a response with 5xx status code: 500
{"code":500,"reason":"Unknown error","message":"An unknown error has occurred."}
I was expecting a 4xx error but got 5xx. Thinking maybe this should have been a 401, but at least a 4xx.
Run...Received a response with 5xx status code: 500
{"code":500,"reason":"Unknown error","message":"An unknown error has occurred."}
I was expecting a 4xx error but got 5xx. Thinking maybe this should have been a 401, but at least a 4xx.
Run this curl command to reproduce this failure:
curl -X DELETE -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' 'https://osdu.r3m18.preshiptesting.osdu.aws/api/indexer/v2/index?kind='
Was able to reproduce this on AWS and Azure.
curl -X DELETE -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: opendes' 'https://osdu-ship.msft-osdu-test.org/api/indexer/v2/index?kind='https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/81ADR: Configurable Index Extensions and De-Normalizations2024-02-14T18:00:03ZThomas Gehrmann [slb]ADR: Configurable Index Extensions and De-Normalizations<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed...<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed to ADR: User-friendly/App-friendly
Index Schemas
in [Enterprise Architecture ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66)
<details>
<summary markdown="span">Preparation Material</summary>
OSDU Data Definitions conducted a number of sessions in the Core Concepts meetings, which contain supplementary
information:
**2022**
1. [Meeting Minutes 2022-07-05](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-05-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-de-normalizations)
2. [Meeting Minutes 2022-07-12](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-12-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
3. [Meeting Minutes 2022-07-19](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-19-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
4. [Meeting Minutes 2022-07-26](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-26-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-aka-index-schemas)
**2023**
1. [Meeting Minutes 2023-03-21](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-21-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-adr-66-configuration)
2. [Meeting Minutes 2023-03-28](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-28-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-configuration-mechanics-schema-review)
3. [Enterprise Architecture Advice Forum 2023-04-12](https://opensdu.slack.com/archives/C04TPV9CRUP/p1681291140407219?thread_ts=1681217870.084929&cid=C04TPV9CRUP)
</details>
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Context & Scope
The entity type schemas delivered by the OSDU Data definitions subcommittee pose a number of challenges
for consumers. Most of them are due to the normalization of schemas and the friendliness to ingestors, which allows
storage of values as is and less standardized. The main problem is the usage of arrays of objects, which are difficult
when forming queries and cause costs for indexing. So far the issues have been mitigated by decorating arrays of objects
with `x-osdu-indexing` instructions. An umbrella issue has been recorded in
[community DD issue #30](https://community.opengroup.org/osdu/data/data-definitions/-/issues/30), which collects a
numer of more detailed requests.
In previous OSDU prototypes, this was addressed by specific workarounds,
see [OSDU R1 Indexing Approach and Specification](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/wikis/uploads/46b4f84f0903cc385abd147a0175a00a/r1_indexing.pdf).
Here an attempt to classify the workarounds listed in the R1 document above:
1. Extraction of standardized values from arrays of objects using conditions (e.g., Well UWI, SpudDate).
2. Chasing relationships to parent or related objects in order to de-normalize parent/related object values on children.
3. Offering related object's Name/Code for presentations in applications.
4. Counting children of well-known kinds. (The priority of this is lower compared to 1 and 2. The current Search service
should be capable of performing querying a particular parent-child relationship.)
The current methods using `x-osdu-virtual-properties`, `x-osdu-is-derived` and `x-osdu-indexing` JSON schema decorations
fall short when the query conditions become dependent on platform operators usage of, e.g., reference values. In many
cases the reference value lists shipped by OSDU are incomplete or not clearly enough documented to guide global platform
standards.
[Back to TOC](#TOC)
---
## Requirements
* We need a configurable way to define rules for property extraction, either from nested arrays of objects or from
related objects.
* We need OSDU provided standard index schema extensions to extend the entity types schemas with extracted values. (
Governance for interoperability)
* We need to open the index schema extensions to applications and services to optimize frequently used query patterns.
One of them is the look-up of names or codes of related objects where the source record holds the target record id.
* We need a platform embedded service, which performs the extractions and de-normalizations on demand (data
creation/update events)
* we need platform support to refresh indexes if the indexing schemas change (both for OSDU and application indexing
schemas).
[Back to TOC](#TOC)
---
# Tradeoff Analysis
The original tradeoff analysis was performed and recorded
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66).
The need for performance required further simplification.
* Replicating derived/de-normalized property values in Storage records was discarded as this would create an enormous
stack of versions for each individual record as records would need to be updated if properties derived from parents or
children changed.
* Instead, de-normalization could happen exclusively in the indexer, simultaneously exploiting the already indexed
values of parent and children records. (Preferred option)
* Using configurable index extension rules was already proposed
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66). The
proposed additional index schemas with references to configurations were discarded. All required information can be
encoded in the configurations themselves. Any index extension schema fragments and documentation can be auto-generated
from the configurations.
* Interoperability is achieved by firm governance rules - the configurations are stored and customizable as OPEN
governance reference-data. However, additional governance rules have to be provided to keep interoperability
guaranteed across deployments and to prevent unwanted interference of index extensions with actual schema properties.
[Back to TOC](#TOC)
---
# Solution
## Index Extension, Data Definition
OSDU Standard index extensions are defined by OSDU Data Definition work-streams with the intent to provide
user/application friendly, derived properties. The standard set, together with the OSDU schemas, form the
interoperability foundation. They can contribute to deliver domain specific APIs according to the Domain Driven Design
principles.
The configurations are encoded in OSDU reference-data records, one per each major schema version. The proposed type name
is IndexPropertyPathConfiguration. The diagram below shows the decomposition into parts.
![IndexPropertyPathConfiguration](/uploads/7f1330dd7a41903a90174feb7fe2c9d9/IndexPropertyPathConfiguration.png)
* One IndexPropertyPathConfiguration record corresponds to one schema kind's major version, i.e., the
IndexPropertyPathConfiguration record id for all the `schema osdu:wks:master-data--Wellbore:1.*.*` kinds is set
to `partition-id:reference-data--IndexPropertyPathConfiguration:osdu:wks:master-data--Wellbore:1`. Code, Name and
Descriptions are filled with meaningful data as usual for all reference-data types.
* The additional index properties are added with one JSON object each in the `Configurations[]` array. The Name defined
the name of the index 'column', or the name of the property one can search for. The Policy decides, in the current
usage, whether the resulting value is a single value or an array containing the aggregated, derived values.
* Each `Configurations[]` element has at least one element defined in `Paths[]`.
* The `ValueExtraction` object has one mandatory property, `ValuePath`. The other optional two properties hold value
match conditions, i.e., the property containing the value to be matched and the value to match.
* If no `RelatedObjectsSpec` is present, the value is derived from the object being indexed.
* If `RelatedObjectsSpec` is provided, the value extraction is carried out in related objects - depending on
the `RelationshipDirection` indirection parent/related object or children. The property holding the record id to
follow is specified in `RelatedObjectID`, so is the expected target kind. As in `ValueExtraction`, the selection can
be filtered by a match condition (`RelatedConditionProperty` and `RelatedConditionMatches`)
With this, the extension properties can be defined as if they were provided by a schema.
Most of the use cases deal with text (string) types. The definition of configurations is however not limited to string
types. As long as the property is known to the indexer, i.e., the source record schema is describing the types, the type
can be inferred by the indexer. This does not work for nested arrays of objects, which have not been indexed
with `"x-osdu-indexing": {"type":"nested"}`. In this case the types unknown to teh Indexer Service are
string-serialized; the resulting index type is then of type `string`, still supporting text search.
[Back to TOC](#TOC)
---
### Use Case 1, WellUWI
_As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am
able to specify a prioritized AliasNameType list to look up value in the NameAliases array._
The configuration demonstrates extractions from the record being indexed itself. With Policy `ExtractFirstMatch`, the
first value matching the condition `RelatedConditionProperty` is equal to one of `RelatedConditionMatches`.
<details><summary>Configuration for Well, extract WellUWI from NameAliases[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"ValueExtraction": {
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--AliasNameType:UniqueIdentifier:",
"{{data-partition-id}}:reference-data--AliasNameType:RegulatoryName:",
"{{data-partition-id}}:reference-data--AliasNameType:PreferredName:",
"{{data-partition-id}}:reference-data--AliasNameType:CommonName:"
],
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"ValuePath": "data.NameAliases[].AliasName"
}
}
],
"UseCase": "As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am able to specify a prioritized AliasNameType list to look up value in the NameAliases array."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 2, CountryNames
_As a user I want to find objects by a country name, with the understanding that an object may extend over country
boundaries._
This configuration demonstrates the extraction from related index objects - here `RelatedObjectKind`
being `osdu:wks:master-data--GeoPoliticalEntity:1.`, which are found via `RelatedObjectID` as
in `data.GeoContexts[].GeoPoliticalEntityID`. The condition is constrained to be that GeoTypeID is
GeoPoliticalEntityType:Country.
<details><summary>Configuration for Well, extract CountryNames from GeoContexts[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "CountryNames",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--GeoPoliticalEntityType:Country:"
],
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
],
"UseCase": "As a user I want to find objects by a country name, with the understanding that an object may extend over country boundaries."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 3, Wellbore Name on WellLog Children
_As a user I want to discover WellLog instances by the wellbore's name value._
A variant of this can be WellUWI from parent Wellbore → Well; in that case the value would be derived from the
already extended index values.
This configuration demonstrates extractions from multiple `Paths[]`.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellboreName",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.VirtualProperties.DefaultName"
}
},
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.FacilityName"
}
}
],
"UseCase": "As a user I want to discover WellLog instances by the wellbore's name value."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 4, Wellbore index WellLogCurveMnemonics
_As a user I want to find Wellbores by well log mnemonics._
This configuration demonstrates the Policy `ExtractAllMatches` with related objects discovered by
RelationshipDirection `ParentToChildren`, i.e., related objects referring the indexed record.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellLogCurveMnemonics",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelationshipDirection": "ParentToChildren",
"RelatedObjectID": "WellboreID",
"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."
},
"ValueExtraction": {
"ValuePath": "Curves[].Mnemonic"
}
}
],
"UseCase": "As a user I want to find Wellbores by well log mnemonics."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
## Index Extension, Governance
OSDU Data Definition ships reference value list content for all reference-data group-type entities. The type
IndexPropertyPathConfiguration is classified as OPEN governance, which usually means that new records can be added by
platform operators. This rule must be adjusted for IndexPropertyPathConfiguration records.
### Permitted Changes to IndexPropertyPathConfiguration Records
It is permitted to
* customize the conditions for value extractions, notable the matching values in `RelatedConditionMatches`.
* add additional `Paths[]` elements to `Configurations[].Paths[]`
* add new index property configuration objects to the `Configurations[]` array. To avoid interference with future OSDU
updates it is strongly recommended to add a namespace prefix to the Configurations[].Name, e.g., "OperatorX.WellUWI".
### Prohibited Changes to IndexPropertyPathConfiguration Records
It is not permitted to
* change the target value type of existing, OSDU shipped index extensions. Example the `ExtractionPath` to a string
property in the original OSDU `Configurations[].ValueExtraction.ValuePath` must not be altered to a number, integer,
or array.
* change the meaning of existing, OSDU shipped index extensions.
* remove OSDU shipped extension definitions in Configurations[].
[Back to TOC](#TOC)
---
## Consumption by Indexer Service
### Recursive Index Updates
With the introduction of de-normalizations record updates can cause infinite recursions. The implementation needs to
address this and avoid situations like in the following diagram:
![Recursions](/uploads/020675583cb7b65560f0d73ffe08fc3c/Recursions.png)
On the left hand Storage records are updated to new versions, which trigger indexing. The update of the index triggers
the index update of related index records due to the derived property values (as defined in the `RelatedObjectsSpec`).
These updates may, in turn, cause a recursion. This must not happen.
The augmenter introduces a new attribute `ancestry_kinds` in the Attributes map of the message payload when sending
messages to update the index of parent/children records. The value of `ancestry_kinds` attribute can include multiple
kinds separated by comma. This new attribute is used to prevent infinite loop of the index chasing. The indexer-queue
must pass the attribute back to the indexer when it receives indexing messages.
### Pseudo-Code
1. For each record to be indexed (create/update event from Storage service):
* Has the record kind a IndexPropertyPathConfiguration?
* Yes
* get or create the internal index schema that combines the schema of the record kind and schema of extended
properties
* create index document that combines the properties of original record and extended properties
* call ElasticSearch service to create or update the index of the record with extended properties
* No
* **_No action_** (=default for records without IndexPropertyPathConfiguration)
2. Re-Indexing (create/update event from Storage service for a IndexPropertyPathConfiguration record)<br>
To update the schema (or say template) of the kind in ElasticSearch when the kind is re-indexed:
* create the internal index schema derived from the kind (as registered in the Schema service)
* create the internal index schema derived from IndexPropertyPathConfiguration
* merge the internal index schemas
* convert the schema to ElasticSearch template
* call ElasticSearch service to update the index template (schema)
[Back to TOC](#TOC)
---
## Accepted Limitations
* A change in the configurations requires re-indexing of all the records of a major schema version kind. It is the same
limitation as an in-place schema change for any kind.
* All the extensions defined in the IndexPropertyPathConfiguration records refer to properties in the `data` block,
including `ValuePath`, `RelatedObjectID`, `RelatedConditionProperty`.
* Only properties in the `data` block of records being indexed can be reached by the `ValuePath`; system properties are
out of reach. The prefix `data.` is therefore optional and can be omitted.
* The formats/values of the extended properties are extracted from the formats/values of the related index records. If
the formats of the original properties are unknown in the related index records, the indexer will set the value type
of the extended properties as string or string array. (With additional complexity and schema parsing, this limitation
can be overcome, but currently the added value seems to be marginal.)
* If the extended properties are extracted from arrays of objects indexed with
(`"x-osdu-indexing": {"type":"flattened"}`), the indexer cannot re-construct the object properties to the
nested objects when the policy `ExtractAllMatches` is applied. (The kind of indexing is already a deliberate choice.
With additional complexity, this limitation can be overcome, but currently the added value seems to
be marginal.)
* To simplify the solution, all the related kinds defined in the configuration are kinds with major version only. They
must end with dot ".". For example: `"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."`.
* Index updates may take time. Immediate consistency cannot be expected.
* When a kind derives extended properties from its parent(s), a new data property `data.AssociatedIdentities` is added
on demand by the indexer. The property name `AssociatedIdentities` is therefore reserved by the Indexer and shall not
be used in any OSDU schemas.
Currently, the property name `AssociatedIdentities` is not in use in any of the OSDU well-known schemas. Tests will be
implemented in the OSDU Data Definition pipeline to ensure that this reserved name does not appear as property in
the `data` block.
[Back to TOC](#TOC)
---
# Change Management
1. Configurations are reference-data and need to be ingested/updated.
2. OSDU Data Definitions must take on the task of defining IndexPropertyPathConfiguration records.
3. Updates (extensions) of index extensions must be managed carefully as they cause re-indexing the kinds involved.
# Decision
# Consequences
* The indexer code changes should have no impact on the system if no IndexPropertyPathConfiguration records are present.
[Back to TOC](#TOC)
---
# ADR Comments BelowM18 - Release 0.21https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/79Reindexing timing out2023-06-21T18:03:38ZOkoun-Ola Fabien HouetoReindexing timing outIn addition to issues in #66, it seems that the reindexer is user the user token and therefore reindexing will time out when the user token expires. Reindexer should use mechanism to avoid timing out on the token.
For example, we are get...In addition to issues in #66, it seems that the reindexer is user the user token and therefore reindexing will time out when the user token expires. Reindexer should use mechanism to avoid timing out on the token.
For example, we are getting this error
"_The user is not authorized to perform this action, errors=null, debuggingInfo=account id: null | user email: admin@testing.com_"Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/77Remove number retry attempts for schema not found 404 and remove call to depr...2022-10-28T11:23:49ZHarshika DhootRemove number retry attempts for schema not found 404 and remove call to deprecated storage API in indexer serviceIndexer service is making number of retry attempts for the schemas that are not there in schema service and giving 404, also it is calling depreciated storage service after its attempts for schema service.
To fix this issue we have this ...Indexer service is making number of retry attempts for the schemas that are not there in schema service and giving 404, also it is calling depreciated storage service after its attempts for schema service.
To fix this issue we have this already merged PR [indexer-service/-/merge_requests/384](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/384)Harshika DhootHarshika Dhoot