Indexer issueshttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues2022-09-29T13:41:06Zhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/44Add documentation - explain whether all fields from Document Store are replic...2022-09-29T13:41:06ZDebasis ChatterjeeAdd documentation - explain whether all fields from Document Store are replicated to Index StoreIt would be very useful to have a section providing some background on this subject.
There are times when we see a list of fields from Storage service (GET), but some fields missed from Search service (Query).
I recently experienced thi...It would be very useful to have a section providing some background on this subject.
There are times when we see a list of fields from Storage service (GET), but some fields missed from Search service (Query).
I recently experienced this when working with custom schema (CSV Ingestion test case).
**Recent thread **with @nthakur
> Hi Debasis,
>
> In general if there is no indexing errors, if a field is defined in Schema (Schema from Schema service for the kind) than Indexer will index it. Fields included in Storage service records may or may not have definition in Schema, so you may see lot more fields on Storage records. If there are errors, than index.trace block on Search service response for the record will tell you which properties were skipped over.
>
> https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#get-indexing-status
>
> Please let me know if you have any additional question.
>
> Regards,
> Neelesh
cc - @ChrisZhang , @ethiraj and @sehuboy for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/68Documentation of DateTime normalization while data is indexed2022-09-29T13:41:07ZDebasis ChatterjeeDocumentation of DateTime normalization while data is indexedIt is not obvious as to what to provide in meta block, what in input value in order to have Normalizer adjust DateTime properly after respecting time shift from UTC.
See this worked example.
https://community.opengroup.org/osdu/platform...It is not obvious as to what to provide in meta block, what in input value in order to have Normalizer adjust DateTime properly after respecting time shift from UTC.
See this worked example.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please provide suitable documentation.
cc - @nthakur and @Kateryna_Kurachhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/39Enhance documentation to explain how one can troubleshoot issues in Indexer (...2023-03-09T18:15:53ZDebasis ChatterjeeEnhance documentation to explain how one can troubleshoot issues in Indexer (Normalizer)Please see related issue #32 .
Suggest adding suitable notes in Core Services (Indexer) documentation.
https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview
cc - @nthakur for informationPlease see related issue #32 .
Suggest adding suitable notes in Core Services (Indexer) documentation.
https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview
cc - @nthakur for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/40Indexer, Frame of Reference, DateTime conversion2022-09-29T13:41:07ZDebasis ChatterjeeIndexer, Frame of Reference, DateTime conversionCan you please share a working example of DataTime conversion?
Load manifest (JSON) showing actual data and also matching "meta" block for a date field.
**I tried the following in Load Manifest (AWS R3M8 Preship environment) -**
Data...Can you please share a working example of DataTime conversion?
Load manifest (JSON) showing actual data and also matching "meta" block for a date field.
**I tried the following in Load Manifest (AWS R3M8 Preship environment) -**
Data has this line -
` "ProjectEndDate": "2008-02-01T14:00:00.000+03:00",`
Meta block has this for DateTime conversion -
```
{
"kind": "DateTime",
"name": "UTC-ISO8601",
"persistableReference": "{\"format\":\"yyyy-MM-ddTHH:mm:ss.fffZ\",\"timeZone\":\"UTC\",\"type\":\"DTM\"}",
"propertyNames": [
"ProjectEndDate ]
}
```
**I see this error when I query for troubleshooting (Search service) .**
For now, we can simply discuss problem of DateTime conversion.
```
{
"kind": "osdu:wks:master-data--SeismicAcquisitionSurvey:1.0.0",
"query": "id: \"osdu:master-data--SeismicAcquisitionSurvey:ST0202R08-DC-23Oct\"",
"returnedFields": [
"id",
"index"
]
}
```
Response
```
{
"results": [
{
"index": {
"trace": [
"Unit conversion: persistableReference not valid",
"Unit conversion: persistableReference not valid",
"DateTime conversion: Frame of reference does not match given data for property ProjectEndDate, no conversion applied."
],
"statusCode": 400,
"lastUpdateTime": "2021-10-23T10:13:21.496Z"
},
"id": "osdu:master-data--SeismicAcquisitionSurvey:ST0202R08-DC-23Oct"
}
],
"totalCount": 1
}
```
Thanks for your help.
cc - @gehrmann for information
cc - @jingdongsun , @anujgupta and @shamazum (since some work was done by IBM resources in this area, as far as I remember)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/57Indexer not paying attention to updated DEVELOPMENT status schema2022-08-26T09:22:49ZEric SchoenIndexer not paying attention to updated DEVELOPMENT status schema1. We installed a DEVELOPMENT status schema with an incorrect x-osdu-indexing hint. (It was at the wrong level in the schema.). We indexed some data with this incorrect schema, and the affected field was not queryable.
2. We then fixed t...1. We installed a DEVELOPMENT status schema with an incorrect x-osdu-indexing hint. (It was at the wrong level in the schema.). We indexed some data with this incorrect schema, and the affected field was not queryable.
2. We then fixed the schema (moved only the x-osdu-indexing extensions) and reinstalled it with the PUT endpoint, but with the same version number. We confirmed by retrieving the schema that the changes had been committed to the schema service.
3. We deleted the records created with the prior version and created new records.
4. The field in question was still not queryable.
5. We installed the schema again, but bumped the SchemaVersionPatch value by 1.
6. We deleted the records created with the second version of the schema and created new records with updated "kind" values.
7. This time, the field in question was queryable.
In the past, we've been able to install updated DEVELOPMENT status schema with the same version number and the indexer would appear to take notice. Is the indexer not noticing changes limited to x-osdu-indexing extensions, and using a cached version of the prior schema content?https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/60index-worker & reindex-worker are exposed2022-08-24T10:47:12ZAn Ngoindex-worker & reindex-worker are exposedWhen a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernete...When a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernetes-internal name, never transiting through the App Gateway (or out of the cluster).
The **indexer** service exposes those same endpoints outside the cluster. While most endpoints for services are protected by an Istio **AuthorizationPolicy** to require a valid token (and subsequently use that token to extract user information for authorization within the service), there is no token sent with these requests by the **indexer-queue** and the **indexer** service's **AuthorizationPolicy** excludes these endpoints from the token requirement.
This means any outside caller can send requests to these endpoints, with no authorization and no restriction. This could be exploited to cause denial-of-service attacks, send forged event messages, or use vulnerabilities within the **indexer** service to compromise the entire OSDU system. Because no token is required, these attacks can be done by anyone. Because the OSDU Community software is open-source, even "security through obscurity" is not effective here.
**Recommended approaches to solve this:**
- Use **indexer** VirtualService to reject external requests to these endpoints, making them reachable only from within Kubernetes.
- Add a token to the requests going to **indexer **service's index-worker and reindex-worker endpoints
- [additionally] Consider using Istio's mTLS and/or Kubernetes Network Policy to restrict communication to the **indexer** service's index-worker and reindex-worker endpoints to traffic coming from **indexer-queue**
There may be other acceptable solutions.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/69Integration test to include input value from timezone different from UTC and ...2022-09-29T13:41:06ZDebasis ChatterjeeIntegration test to include input value from timezone different from UTC and test NormalizerPlease see working example from @Kateryna_Kurach
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please include t...Please see working example from @Kateryna_Kurach
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please include test with positive and negative time shift.
https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/indexer-core/src/test/java/org/opengroup/osdu/indexer/util/parser/DateTimeParserTest.java#L31
cc @nthakurhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/66Reindex API - performance, scalability and reliability issues2023-12-18T16:13:23ZNeelesh ThakurReindex API - performance, scalability and reliability issuesRecent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/...Recent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/ChangeReport.md#snapshot-2021-11-09-towards-m10) on Schema service.
- Geoshape queries are broken when Elasticsearch server upgraded from 7.8.1 --> 7.17.x (Confirmed by Elasticsearch Support team, public issue is not available)
Current implementation of Reindex API (per kind) has serious performance, scalability and reliability issues. It does not work at all for kind with few million records. This is blocking us from adopting M10 (now M11) schema updates. Following list summarizes issues with API:
- API throughput is pretty slow and it can only re-index 250K-300K records per hour. In case of partition with 100 million records, this can run over 2 weeks.
- It’s not resilient, if operation fails in the middle, we have to start over.
- There is no transparency for Reindex operation, we don’t know how much progress has been made.
In addition to above issues, we cannot recover Search service in Disaster recovery scenarios as well. In this case, we can use ReindexAll API which use Reindex API (per kind) behind the scene. We run into to all of above issues.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/58The "indexing-progress" service bus topic is unused/unconsumed2022-08-26T09:20:48ZGary MurphyThe "indexing-progress" service bus topic is unused/unconsumed**Summary** There is a topic in service bus that was created to support the ability to monitor the progress indexing jobs were making. However, this topic has never been utilized by any consuming app and it needs to be removed to reduc...**Summary** There is a topic in service bus that was created to support the ability to monitor the progress indexing jobs were making. However, this topic has never been utilized by any consuming app and it needs to be removed to reduce noise and technical debt in the system.
**Details** The "indexing-progress" service bus topic is unused/unconsumed by any subscriber. Also, there is no message expiration/deletion date for this topic. This leads to the topic reaching its capacity, failing any further messages to be pushed on this topic
This is a carry over implementation from GA. The failures for this event are logged but ignored. Although this does not cause availability issues for indexer, it causes a lot of exceptions in the system.
This is the [piece of code](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/indexer-core/src/main/java/org/opengroup/osdu/indexer/service/IndexerServiceImpl.java#L158) that leads to pushing the changes into "indexing-progress" SB topic