Indexer issueshttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues2020-06-24T20:20:38Zhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/3Index Everything2020-06-24T20:20:38ZStephen Whitley (Invited Expert)Index Everything# Index Everything
Publish relevant data and derived insights to OSDU Index
## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context
With a distributed and varied industry such as ours, there...# Index Everything
Publish relevant data and derived insights to OSDU Index
## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context
With a distributed and varied industry such as ours, there is a need to discover the data that is available as an enterprise. As the organization and the industry strives towards data-oriented opportunities to differentiate and find more markets, the domains must participate and contribute to building up a reservoir of data that is universally discoverable.
## Decision
The OSDU index can serve as the landing point to search through this information. And hence it is imperative for the domains to register and publish all relevant data to this index. Along the same lines any insights and derived data can also be published to this central index.
## Rationale
1. Making the relevant data from domain/subdomains discoverable from a central place helps understand the volume and variety of the data that enterprise handles as a whole
2. The OSDU index serves as a marketplace to discover newer workflows and insights which are otherwise silo-ed
## Consequences
1. The principles conflicts with the other principle on domain-centric approach , where-in each domain will have to accommodate a "non-domain" activities of publishing to OSDU or consuming from it. However these must be looked from a large context and justified. https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/5Index cleanup API support2020-10-15T08:03:33ZArtem Dobrynin (EPAM)Index cleanup API support## Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
## Context and Scope
There is no functionality to drop obsolete and stale indices in core module.
## Decision
- Implement `cleanupIndices` endpoint in Indexer service (see ...## Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
## Context and Scope
There is no functionality to drop obsolete and stale indices in core module.
## Decision
- Implement `cleanupIndices` endpoint in Indexer service (see https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/16 as example)
- Add indexes clean-up in Storage service, when Kind was deleted.
## Rational
This change will keep our Elasticsearch indices clean and healthy. Without it, we are forced to monitor Elasticsearch and manually delete all test and stale indices.
This is also affecting our performance. Because of frequent tests, a lot of indices are being created and not deleted. It causes a raise of callback time. With indices cleanup functionality we could avoid that.
## Consequences
We should add this functionality support in every method where there is an index/kind deletionDmitriy RudkoDmitriy Rudko2020-09-22https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/8Add schema service endpoint2021-01-11T21:42:06Zethiraj krishnamanaiduAdd schema service endpointIndexer core logic updated to integrate with schema service and merge to master.
Please update following.
Add Schema service endpoint(`SCHEMA_HOST`) in application.properties [Example](https://community.opengroup.org/osdu/platform/syst...Indexer core logic updated to integrate with schema service and merge to master.
Please update following.
Add Schema service endpoint(`SCHEMA_HOST`) in application.properties [Example](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/provider/indexer-azure/src/main/resources/application.properties#L39)
Update Integration test class path [Example](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/testing/indexer-test-azure/src/test/java/org/opengroup/osdu/step_definitions/index/record/RunTest.java#L23)ethiraj krishnamanaiduDania Kodeih (Microsoft)Wladmir FrazaoJoeDmitriy Rudkoethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/16ADR: Array of Objects support by Indexer2021-06-14T16:29:33ZDmitriy RudkoADR: Array of Objects support by Indexer## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
Currently, Indexer and Search implementation ignores Arrays of Objects structures in schemas. One of the examples of such structure in D...## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
Currently, Indexer and Search implementation ignores Arrays of Objects structures in schemas. One of the examples of such structure in DD schemas are [WellLog.Curves](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Examples/work-product-component/WellLog.1.0.0.json#L303)
You can find historical context in the following issues:
- https://community.opengroup.org/osdu/platform/system/home/-/issues/67
- https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/1
- https://community.opengroup.org/osdu/platform/system/schema-service/-/issues/1
- https://community.opengroup.org/osdu/platform/system/search-service/-/issues/1
As a result, users are not able to:
- Retrieve such objects using Search (fixed in MR !114)
- Do search against info in such objects
- Do complex search queries e.g range / spatial etc
Elasticsearch has several options and data types to handle such cases. There is no silver bullet and each of these types has pros and cons.
- [object type](https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html) - Individual object queries are not supported ([objects are not searchable individually](https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html))
- [nested type](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) - Have a serious impact on performance and produce separate documents e.g. 100 WellLog records with 100 Curves each will produce 100x100 documents under Elastic index.
- [flattened type](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html) - Do not support complex queries and available **ONLY** with X-Pack (Not open-source type)
A specific object might be treated in one of these 3 ways according to a custom hint associated with that object (part of the logical schema). The main question is how the initial DD schema enrichment mechanism would work to add the hints. Generally, there are 2 options:
1. Incorporate hints into the DD schema.
2. Separate DD schema and hints and merge them during the Elasticsearch index creation process.
Both options have pros and cons:
1. Combined schema and hints are the easiest and centralized way to onboard new functionality. Drawbacks are the blended "clean" data model with functional-level attributes and changes of the DD schema.
2. Separated data model and hints follow the separation of concerns principle, but bring other challenges such as maintaining the conformity of 2 physically separated pieces and the necessity to store hints per provider.
The format of the hints file might be one of the following:
1. Copy the original JSON structure and store hints maintaining the original DD schema hierarchy:
`{ "properties" : { "data": { "allOf" : { "properties" : { "Curves" : "x-osdu-flattened" } } } } }`
2. Maintain a path to the desired field and its type:
`{ "properties/data/allOf/properties/Curves" : "x-osdu-flattened" }`
3. Assuming a type is always indexing in some way, the mapping might be done on an object level:
`{ "Curves" : "x-osdu-flattened" }`
In each case, the maintainer should make sure the hints file matches the DD file (e.g. DD schema changes), which adds operational overhead.
## Scope
Implement a general approach on how to handle Arrays of Objects (AO) in Schemas:
- Index
- Search
## Decision
Analyzing 2 options, decided to follow the combined schema and hints approach:
1. Define generic hints (enriched object schema), which will let the Indexer know, how the specific array of objects should be treated when feeding a schema into Elasticsearch:
- `"x-osdu-indexing": "x-type-nested"`
- `"x-osdu-indexing": "x-type-flattened"`
- `"x-osdu-indexing": "x-type-object"`
2. Review R3 schemas and inject hints where applicable.
## Rationale
Elasticsearch doesn't have a type that can be used across all array of objects elements.
The decision on a proper data type should be done as a part of the Data Modeling phase.
The following criteria should be taken into account in each DD case:
- Is object attributes will be used in queries?
- What type of queries will be used?
- What is the cardinality of AO?
- Can information be moved out of AO?
## Consequences
- Review R3 schemas
- Update all cases where we have AO with appropriate meta attribute
- Educate the DD community on the pros and cons of each typehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/19ADR: Indexer to support pattern based indices cleanup for customers.2021-07-09T16:42:34ZAnkit Sharma [Microsoft]ADR: Indexer to support pattern based indices cleanup for customers.## Status
- [x] Proposed
- [] Trialing
- [] Under review
- [] Approved
- [] Retired
## Context
In Azure we were trying to cleanup test indices in our envs by adding a cron job.
We can see that there is already code present to clean up...## Status
- [x] Proposed
- [] Trialing
- [] Under review
- [] Approved
- [] Retired
## Context
In Azure we were trying to cleanup test indices in our envs by adding a cron job.
We can see that there is already code present to clean up indices but there is no API to call this code from outside.
https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/indexer-core/src/main/java/org/opengroup/osdu/indexer/service/CronServiceImpl.java
Also the search documentation talks about cron job API, but I guess the code was never brought to OSDU.
https://community.opengroup.org/osdu/platform/system/search-service
Here we would like to take the view of community that would it be better if we can add the API here.
Also we would to discuss if it will be a beneficial feature for customers
as well where **customers can provides test patterns for the indices to be deleted after X amount of time.**
## Scope
Addition of API to call CRON service of Indexer.
## Decision
## RationaleAnkit Sharma [Microsoft]Ankit Sharma [Microsoft]https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/20Frame of Reference - Handle creating WGS84 values when indexing Master Data W...2022-11-24T12:22:01ZDebasis ChatterjeeFrame of Reference - Handle creating WGS84 values when indexing Master Data Well (AsIngested -> WGS84)See enclosed example JSON file from here.
https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Examples/master-data/Well.1.0.0.json
[Well.1.0.0.json](/uploads/c1166860fcd89fc6cf686715d324a62e/Well...See enclosed example JSON file from here.
https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Examples/master-data/Well.1.0.0.json
[Well.1.0.0.json](/uploads/c1166860fcd89fc6cf686715d324a62e/Well.1.0.0.json)
For SpatialLocation, we see **AsIngestedCoordinates** and also **Wgs84Coordinates**.
Can Indexer's Normalizer handle this situation and fill **Wgs84Coordinates** by applying suitable CRS Conversion from **AsIngestedCoordinates**?
Thus avoid any human error and also ease the effort of creating json/payload manifests.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/39Enhance documentation to explain how one can troubleshoot issues in Indexer (...2023-03-09T18:15:53ZDebasis ChatterjeeEnhance documentation to explain how one can troubleshoot issues in Indexer (Normalizer)Please see related issue #32 .
Suggest adding suitable notes in Core Services (Indexer) documentation.
https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview
cc - @nthakur for informationPlease see related issue #32 .
Suggest adding suitable notes in Core Services (Indexer) documentation.
https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview
cc - @nthakur for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/40Indexer, Frame of Reference, DateTime conversion2022-09-29T13:41:07ZDebasis ChatterjeeIndexer, Frame of Reference, DateTime conversionCan you please share a working example of DataTime conversion?
Load manifest (JSON) showing actual data and also matching "meta" block for a date field.
**I tried the following in Load Manifest (AWS R3M8 Preship environment) -**
Data...Can you please share a working example of DataTime conversion?
Load manifest (JSON) showing actual data and also matching "meta" block for a date field.
**I tried the following in Load Manifest (AWS R3M8 Preship environment) -**
Data has this line -
` "ProjectEndDate": "2008-02-01T14:00:00.000+03:00",`
Meta block has this for DateTime conversion -
```
{
"kind": "DateTime",
"name": "UTC-ISO8601",
"persistableReference": "{\"format\":\"yyyy-MM-ddTHH:mm:ss.fffZ\",\"timeZone\":\"UTC\",\"type\":\"DTM\"}",
"propertyNames": [
"ProjectEndDate ]
}
```
**I see this error when I query for troubleshooting (Search service) .**
For now, we can simply discuss problem of DateTime conversion.
```
{
"kind": "osdu:wks:master-data--SeismicAcquisitionSurvey:1.0.0",
"query": "id: \"osdu:master-data--SeismicAcquisitionSurvey:ST0202R08-DC-23Oct\"",
"returnedFields": [
"id",
"index"
]
}
```
Response
```
{
"results": [
{
"index": {
"trace": [
"Unit conversion: persistableReference not valid",
"Unit conversion: persistableReference not valid",
"DateTime conversion: Frame of reference does not match given data for property ProjectEndDate, no conversion applied."
],
"statusCode": 400,
"lastUpdateTime": "2021-10-23T10:13:21.496Z"
},
"id": "osdu:master-data--SeismicAcquisitionSurvey:ST0202R08-DC-23Oct"
}
],
"totalCount": 1
}
```
Thanks for your help.
cc - @gehrmann for information
cc - @jingdongsun , @anujgupta and @shamazum (since some work was done by IBM resources in this area, as far as I remember)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/42extensionProperties should be searcheable (again)2023-08-28T12:53:35ZFabrice HauyextensionProperties should be searcheable (again)From my perspective, it is critical that the properties inside "ExtensionProperties" can be searched.
This block allows users/companies to add additional properties which are not part of the standardised section of the schema. To be use...From my perspective, it is critical that the properties inside "ExtensionProperties" can be searched.
This block allows users/companies to add additional properties which are not part of the standardised section of the schema. To be useful, records should be found when searching for values from these extensionProperties.
Another use example : OSDU Data Quality Rule/RuleSet schemas have no reference to the data type/schema they apply to. So to make use of the Data Quality data types, we would need to add the kind a rule refers to within extensionProperties. And to retrieve the rules for a given data type, you would need to search for that data type within the quality records.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/44Add documentation - explain whether all fields from Document Store are replic...2022-09-29T13:41:06ZDebasis ChatterjeeAdd documentation - explain whether all fields from Document Store are replicated to Index StoreIt would be very useful to have a section providing some background on this subject.
There are times when we see a list of fields from Storage service (GET), but some fields missed from Search service (Query).
I recently experienced thi...It would be very useful to have a section providing some background on this subject.
There are times when we see a list of fields from Storage service (GET), but some fields missed from Search service (Query).
I recently experienced this when working with custom schema (CSV Ingestion test case).
**Recent thread **with @nthakur
> Hi Debasis,
>
> In general if there is no indexing errors, if a field is defined in Schema (Schema from Schema service for the kind) than Indexer will index it. Fields included in Storage service records may or may not have definition in Schema, so you may see lot more fields on Storage records. If there are errors, than index.trace block on Search service response for the record will tell you which properties were skipped over.
>
> https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#get-indexing-status
>
> Please let me know if you have any additional question.
>
> Regards,
> Neelesh
cc - @ChrisZhang , @ethiraj and @sehuboy for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/50ADR: Search text with special characters2022-12-09T10:11:48ZNeelesh ThakurADR: Search text with special characters## Status
- [X] Proposed
- [X] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
**Principal Motivation**: Currently, due to the way Elasticsearch analyzes documents as they are indexed, it is impossible to perform certain hi...## Status
- [X] Proposed
- [X] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
**Principal Motivation**: Currently, due to the way Elasticsearch analyzes documents as they are indexed, it is impossible to perform certain high-value searches on unstructured content. This ADR proposes changing the default Elastic index analyzer for both unstructured and structured types so that those high-value searches can be performed. The main business driver for this change is coming from users of unstructured document searches, but due to the heavy change process for making these changes, it makes sense to make some smaller changes to structured analysis as well. <br/><br/>
**Scope**: The indexing analyzer will be changed to alter the way incoming strings are tokenized. This will not be an out-of-the-box analyzer, but will have small adjustments. Mainly this will enable search queries to find exact matches on strings that aren't searchable currently due to special characters that are part of the incoming string but that are tokenized away during indexing. A quick example is a North Sea well name like "8/3-1". This string is indistinguishable from "8-3/1" or "8 3 1" at the search level due to the treatment of the special characters "/" and "-". The proposal is to introduce two new analyzers--one for structured data (which is essentially all name:value pairs in JSON documents) and one for unstructured data which is generally the output of an OCR process or other text extraction process for large source documents.<br/><br/>
**Key Points**
- structured vs. unstructured. The analyzer scope will be determined by the type of the incoming document so that unstructured (generally <= 5% of total data) will be identified by a list of types, and structured will be everything else.
- to implement the new analyzer changes, a re-index of the data will be required. This is heavy-duty and should be done as infrequently as possible.
- the structured analyzer (Approach 2) will use the more generic analyzer which will not require changes for (most) special character cases
- the unstructured analyzer (Approach 1) will be focused on retaining only certain special characters
- the full proposal is to implement both Approach 1 and Approach 2 and re-index just once as that is a heavy-duty operation
#### Approach 1: Generic character-set analyzer
- Replace `\n ( ) { } [ ]` characters with white-space
- Tokenize on white-space
- Applies following token filter:
- Lowercase
- English stemmer
#### Approach 2: Defined character-set analyzer
- Replace `\n ( ) { } [ ]` characters with white-space
- Tokenize on white-space
- Applies following token filter:
- Lowercase
- English stemmer
- Word delimiter: removes all special characters from tokens except `/ - : .` characters
**Use Case with Input Text Samples**
**Scenario**
Assumption is that unstructured data is ingested and indexed using the current, approach 1, and approach 2 analyzers. For clarification, the data in numbered points would land in the JSON metadata record in the "data" block (e.g."data":"indexedPage_1")
1. As the story has been told many times, exploration efforts in the North Sea were rapidly declining by fall 1969. A total of 32 wells had been drilled in the Norwegian sector since Esso completed the first exploration well (8/3-1) as dry in 1966. It didn't help that Mario had an obsession with the sequences 2-4-1 and 2-8-3 which caused some very strange OCD behavior at a number of review meetings when the exploration wells were discussed. Being an explorationist was much harder than jumping over Goombas.
2. Murphy's first assignment for his new employer, well 2/4-1 (the original name is 2/4-1X), nearly ended up as a terrible disaster. The well was spudded on August 21 with Ocean Viking and Max was there from the very first turn of the drill. He was prepared to describe cuttings through a 3,000 m thick section of boring clay and shale before entering the reservoir. The Quaternary and Tertiary clay is soft and the drilling is fast. After only one week casing had been set at 146m (30’’) and 623m (20’’). Drilling had resumed and at 1,663m, on Sunday morning August 31, the formation pressure increased tremendously and oil flowed into the wellbore and the mud tanks.
3. dev_tools_console.pdf
4. c:\workspace\petrel\gullfaks
5. /petrel/workspace/2017/05/16/gullfaks.pet
6. measurements are 232.113, -999.25 (old LAS sentinel), 3.14159, etc.
7. Som historien har blitt fortalt mange ganger, gikk leteinnsatsen i Nordsjøen raskt ned høsten 1969. Totalt var det boret 32 brønner i norsk sektor siden Esso fullførte den første letebrønnen (8/3-1) som tørr. i 1966. Det hjalp ikke at Mario hadde en besettelse av sekvensene 2-4-1 og 2-8-3 som forårsaket noe veldig merkelig OCD-oppførsel på en rekke gjennomgangsmøter da letebrønnene ble diskutert. Å være en utforsker var mye vanskeligere enn å hoppe over Goombas.
| End User Entry | Standard (Current) | Analyzer I | Analyzer II | Comments |
|--|--|--|--| --|
| `2/4-1X` | ✗ | ✓ | ✓ | |
| `2-4-1` | ✗ | ✓ | ✓ | |
| `-999.25` | ✗ | ✓ | ✓ | |
| `explore` | ✗ | ✓ | ✓ | fuzzy match includes `explore` and `exploration` |
| `murphy` | ✗ | ✓ | ✓ | possessives at the end and search by one token. Search with possessive `murphy's` works for all |
| `o'neil` | ✗ | ✓ | ✓ | possessives at the end and search by one token. Search with possessive `o'neil's` works for all |
| `første letebrønnen` | ✓ | ✓ | ✓ | searching non-English phrases (e.g. Norwegian in text fragment 7) |
## Trade-off Analysis
1. Do nothing. <br/>
**Pros**: no re-indexing required, no chance of breaking changes
**Cons**: does not solve business problem of correct searches for exact character strings in unstructured and structured data.
2. Approach 1 Only. Alter indexing analyzer for unstructured types only. <br/>
**Pros**: solves problem where exact match strings for common oilfield terms (e.g. wells) work.
**Cons**: heavy-duty re-indexing is required to recreate index information for the affected types, leaves structured data searches with same problem re: special characters.
3. Approach 1 and Approach 2. Alter indexing analyzer for unstructured and structured types. <br/>
**Pros**: solves search problem for both structured and unstructured data, one-time re-indexing operation
**Cons**: heavy-duty re-indexing.
## Decision
If there is agreement by consumers of OSDU data records that it is good to be able to enter searches like "data.indexedDoc:\"2\/4\-1\" and find the North Sea well with that name, then something has to be done to the analyzer. <br/>
Given the big investment any change to the indexing analyzer would require in re-indexing cycles, the right approach is to implement changes for the structured and unstructured types (Approach 1 and Approach 2). This ensures similar exact search results for things like data.wellname and data.unstructuredText containing the same well name using special characters.
## Consequences
**Known Issues/Limitation/Notes**
- Any change to character-set analyzer **will require re-indexing across all partitions and environments.**
- We will get the support for non English character out of the box with limitations. Text is analyzed with white-space and English stemmer, if white-space is not used as word boundary (e.g. Japanese) than inaccurate/empty results will be returned.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/57Indexer not paying attention to updated DEVELOPMENT status schema2022-08-26T09:22:49ZEric SchoenIndexer not paying attention to updated DEVELOPMENT status schema1. We installed a DEVELOPMENT status schema with an incorrect x-osdu-indexing hint. (It was at the wrong level in the schema.). We indexed some data with this incorrect schema, and the affected field was not queryable.
2. We then fixed t...1. We installed a DEVELOPMENT status schema with an incorrect x-osdu-indexing hint. (It was at the wrong level in the schema.). We indexed some data with this incorrect schema, and the affected field was not queryable.
2. We then fixed the schema (moved only the x-osdu-indexing extensions) and reinstalled it with the PUT endpoint, but with the same version number. We confirmed by retrieving the schema that the changes had been committed to the schema service.
3. We deleted the records created with the prior version and created new records.
4. The field in question was still not queryable.
5. We installed the schema again, but bumped the SchemaVersionPatch value by 1.
6. We deleted the records created with the second version of the schema and created new records with updated "kind" values.
7. This time, the field in question was queryable.
In the past, we've been able to install updated DEVELOPMENT status schema with the same version number and the indexer would appear to take notice. Is the indexer not noticing changes limited to x-osdu-indexing extensions, and using a cached version of the prior schema content?https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/58The "indexing-progress" service bus topic is unused/unconsumed2022-08-26T09:20:48ZGary MurphyThe "indexing-progress" service bus topic is unused/unconsumed**Summary** There is a topic in service bus that was created to support the ability to monitor the progress indexing jobs were making. However, this topic has never been utilized by any consuming app and it needs to be removed to reduc...**Summary** There is a topic in service bus that was created to support the ability to monitor the progress indexing jobs were making. However, this topic has never been utilized by any consuming app and it needs to be removed to reduce noise and technical debt in the system.
**Details** The "indexing-progress" service bus topic is unused/unconsumed by any subscriber. Also, there is no message expiration/deletion date for this topic. This leads to the topic reaching its capacity, failing any further messages to be pushed on this topic
This is a carry over implementation from GA. The failures for this event are logged but ignored. Although this does not cause availability issues for indexer, it causes a lot of exceptions in the system.
This is the [piece of code](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/indexer-core/src/main/java/org/opengroup/osdu/indexer/service/IndexerServiceImpl.java#L158) that leads to pushing the changes into "indexing-progress" SB topichttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/60index-worker & reindex-worker are exposed2022-08-24T10:47:12ZAn Ngoindex-worker & reindex-worker are exposedWhen a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernete...When a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernetes-internal name, never transiting through the App Gateway (or out of the cluster).
The **indexer** service exposes those same endpoints outside the cluster. While most endpoints for services are protected by an Istio **AuthorizationPolicy** to require a valid token (and subsequently use that token to extract user information for authorization within the service), there is no token sent with these requests by the **indexer-queue** and the **indexer** service's **AuthorizationPolicy** excludes these endpoints from the token requirement.
This means any outside caller can send requests to these endpoints, with no authorization and no restriction. This could be exploited to cause denial-of-service attacks, send forged event messages, or use vulnerabilities within the **indexer** service to compromise the entire OSDU system. Because no token is required, these attacks can be done by anyone. Because the OSDU Community software is open-source, even "security through obscurity" is not effective here.
**Recommended approaches to solve this:**
- Use **indexer** VirtualService to reject external requests to these endpoints, making them reachable only from within Kubernetes.
- Add a token to the requests going to **indexer **service's index-worker and reindex-worker endpoints
- [additionally] Consider using Istio's mTLS and/or Kubernetes Network Policy to restrict communication to the **indexer** service's index-worker and reindex-worker endpoints to traffic coming from **indexer-queue**
There may be other acceptable solutions.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/65Unit conversion - misleading error message2022-08-23T21:19:14ZDebasis ChatterjeeUnit conversion - misleading error messageMy meta block had this problem "**Persistablereference**" (P in uppercase)
```
{
"kind": "Unit",
"name": "ms",
"PersistableReference": "{\"abcd\":{\"a\":0.0,\"b\":0.001,\"c\":1.0,\"...My meta block had this problem "**Persistablereference**" (P in uppercase)
```
{
"kind": "Unit",
"name": "ms",
"PersistableReference": "{\"abcd\":{\"a\":0.0,\"b\":0.001,\"c\":1.0,\"d\":0.0},\"symbol\":\"ms\",\"baseMeasurement\":{\"ancestry\":\"T\",\"type\":\"UM\"},\"type\":\"UAD\"}",
"unitOfMeasureID": "{{data-partition-id}}:reference-data--UnitOfMeasure:ms:",
"propertyNames": [
"SampleInterval",
"RecordLength"
]
}
```
But Indexer error message (as seen from Search service - Should not say "CRS conversion". There is nothing to do with CRS conversion in my JSON payload.
```
"index": {
"trace": [
"CRS conversion: 'persistableReference' missing, no conversion applied. Affected properties: AreaCalculated,AreaNominal",
"CRS conversion: 'persistableReference' missing, no conversion applied. Affected properties: SampleInterval,RecordLength",
"Unit conversion: persistableReference missing",
"Unit conversion: persistableReference missing"
],
```https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/66Reindex API - performance, scalability and reliability issues2023-12-18T16:13:23ZNeelesh ThakurReindex API - performance, scalability and reliability issuesRecent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/...Recent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/ChangeReport.md#snapshot-2021-11-09-towards-m10) on Schema service.
- Geoshape queries are broken when Elasticsearch server upgraded from 7.8.1 --> 7.17.x (Confirmed by Elasticsearch Support team, public issue is not available)
Current implementation of Reindex API (per kind) has serious performance, scalability and reliability issues. It does not work at all for kind with few million records. This is blocking us from adopting M10 (now M11) schema updates. Following list summarizes issues with API:
- API throughput is pretty slow and it can only re-index 250K-300K records per hour. In case of partition with 100 million records, this can run over 2 weeks.
- It’s not resilient, if operation fails in the middle, we have to start over.
- There is no transparency for Reindex operation, we don’t know how much progress has been made.
In addition to above issues, we cannot recover Search service in Disaster recovery scenarios as well. In this case, we can use ReindexAll API which use Reindex API (per kind) behind the scene. We run into to all of above issues.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/68Documentation of DateTime normalization while data is indexed2022-09-29T13:41:07ZDebasis ChatterjeeDocumentation of DateTime normalization while data is indexedIt is not obvious as to what to provide in meta block, what in input value in order to have Normalizer adjust DateTime properly after respecting time shift from UTC.
See this worked example.
https://community.opengroup.org/osdu/platform...It is not obvious as to what to provide in meta block, what in input value in order to have Normalizer adjust DateTime properly after respecting time shift from UTC.
See this worked example.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please provide suitable documentation.
cc - @nthakur and @Kateryna_Kurachhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/69Integration test to include input value from timezone different from UTC and ...2022-09-29T13:41:06ZDebasis ChatterjeeIntegration test to include input value from timezone different from UTC and test NormalizerPlease see working example from @Kateryna_Kurach
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please include t...Please see working example from @Kateryna_Kurach
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M12/Test_Plan_Results_M12/Manifest_Ingestion/M12-GCP-Master-FoR-Date-check-Debasis-Kateryna.txt
Please include test with positive and negative time shift.
https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/indexer-core/src/test/java/org/opengroup/osdu/indexer/util/parser/DateTimeParserTest.java#L31
cc @nthakurhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/73Indexer fails to correctly parse properties with special characters2022-08-23T15:08:44ZAn NgoIndexer fails to correctly parse properties with special charactersFor example:
```
"SpatialArea": {
"Wgs84Coordinates": {
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
2.2863,
61.198685
...For example:
```
"SpatialArea": {
"Wgs84Coordinates": {
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
2.2863,
61.198685
]
},
"properties": {
"id": "a:b"
},
"type": "Feature"
}
],
"type": "FeatureCollection"
}
}
```
Indexer fails to parse the properties id whose value contains a colon.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/77Remove number retry attempts for schema not found 404 and remove call to depr...2022-10-28T11:23:49ZHarshika DhootRemove number retry attempts for schema not found 404 and remove call to deprecated storage API in indexer serviceIndexer service is making number of retry attempts for the schemas that are not there in schema service and giving 404, also it is calling depreciated storage service after its attempts for schema service.
To fix this issue we have this ...Indexer service is making number of retry attempts for the schemas that are not there in schema service and giving 404, also it is calling depreciated storage service after its attempts for schema service.
To fix this issue we have this already merged PR [indexer-service/-/merge_requests/384](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/384)Harshika DhootHarshika Dhoot