Indexer issueshttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues2023-08-28T12:53:35Zhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/42extensionProperties should be searcheable (again)2023-08-28T12:53:35ZFabrice HauyextensionProperties should be searcheable (again)From my perspective, it is critical that the properties inside "ExtensionProperties" can be searched.
This block allows users/companies to add additional properties which are not part of the standardised section of the schema. To be use...From my perspective, it is critical that the properties inside "ExtensionProperties" can be searched.
This block allows users/companies to add additional properties which are not part of the standardised section of the schema. To be useful, records should be found when searching for values from these extensionProperties.
Another use example : OSDU Data Quality Rule/RuleSet schemas have no reference to the data type/schema they apply to. So to make use of the Data Quality data types, we would need to add the kind a rule refers to within extensionProperties. And to retrieve the rules for a given data type, you would need to search for that data type within the quality records.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/111ADR: Search backend (Elasticsearch) Upgrade2024-02-05T14:31:07ZNeelesh ThakurADR: Search backend (Elasticsearch) Upgrade## Status
- [X] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Background
Elasticsearch serves as the backend for indexer and search services. To communicate with the Elasticsearch server (deployed and managed independent...## Status
- [X] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Background
Elasticsearch serves as the backend for indexer and search services. To communicate with the Elasticsearch server (deployed and managed independently), these services use the Elasticsearch [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) client SDK. Official OSDU Data Platform supported Elasticsearch version for server & client SDK is [v7.8.1](https://www.elastic.co/blog/elastic-stack-7-8-1-released). Current version is quite old, and already beyond [end of life](https://www.elastic.co/support/eol) support. A new major version (v.8.x), which was released in April 2022, is also available. Furthermore, not only will an upgrade to Elasticsearch resolve issues & offers new features and capabilities, but it will also save costs. Here are few reasons, Elasticsearch client & server should be updated:
- [Log4J vulnerability](https://blog.qualys.com/vulnerabilities-threat-research/2021/12/10/apache-log4j2-zero-day-exploited-in-the-wild-log4shell) discovered on December 2021 forced all CSPs to update their Elasticsearch server version. At this time, all CSPs are on different server versions e.g. Azure v7.17.x, IBM v7.11.x etc. Even though Elasticsearch promises on not introducing any breaking change on a major version, we have found issues in past. Ideally all CSPs should be on same client & server versions to avoid any potential issues.
Community effort on [Reference Implementation](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/architecture-decision-records/-/blob/main/0006-osdu-will-have-a-reference-implementation.md?ref_type=heads) gives us a good opportunity to upgrade and align Elasticsearch client and server version.
- Elasticsearch v7.8.1 has reached end of life some time back. [Officially supported](https://www.elastic.co/support/eol) version for Elasticsearch v7 is v7.17.x or higher. If an issue found with client SDK or server, than fix is usually avaialable in most recent version.
- Elasticsearch has launched many new versions past v7.8.1 with several improvements & new features, some notable ones in v8.x are mentioned below:
- Elasticsearch v8.3.x has [removed](https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#shard-count-recommendation) 1k shard count per node limitations. OSDU DD Definition Team has introduced several new schemas over the course of few milestone releases. On Elasticsearch, each schema index generates two shards. Currently, a single node in an elasticsearch instance can only hold up to 1K shards. A small or medium-sized Elasticsearch cluster can quickly run out of shard capacity with so many new schemas.
- Reduced resource requirements via [memory heap reductions](https://www.elastic.co/blog/significantly-decrease-your-elasticsearch-heap-memory-usage). This can result in lowered customers’ total cost of ownership. Added support for the [ARM architecture](https://www.elastic.co/blog/whats-new-elasticsearch-7-12-0-put-a-search-box-on-s3), it offers 20% better performance while being 10% cheaper than x86-64. Introduced novel ways to use less storage by decoupling compute from storage with a new [frozen tier and searchable snapshots](https://www.elastic.co/blog/whats-new-elasticsearch-7-10-0-searchable-snapshots-store-more-for-less).
- Improved indexing latency of several data types including [geo-points, geo-shapes](https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-highlights.html#_faster_indexing_of_geo_point_geo_shape_and_range_fields) etc. [Enhanced error messages](https://issues.apache.org/jira/browse/LUCENE-9538) on invalid geo-shape indexing. It can now provide more meaningful messages capturing issues with shape, rather a generic messages in current version. Several new geo queries (e.g. [geo-grid query](https://www.elastic.co/guide/en/elasticsearch/reference/8.3/release-highlights.html#new_geo_grid_query) etc.), aggregations (e.g. [cartesian-centroid](https://www.elastic.co/guide/en/elasticsearch/reference/8.6/release-highlights.html#support_cartesian_centroid_aggregation_over_points_shapes), [geo-hex](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html#geohex_aggregations_on_both_geo_point_geo_shape_fields) aggregation over points and shapes etc.) are also introduced.
- Introduced a new [health API](https://www.elastic.co/guide/en/elasticsearch/reference/8.7/release-highlights.html#health_api_generally_available) designed to report the health of the cluster. The new API offers a detailed report that can include a precise diagnosis and a solution, as well as a high level overview of the cluster health. The operational teams can benefit greatly from this API.
- Released a full suite of native [vector search](https://www.elastic.co/what-is/vector-search) via [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/8.0/release-highlights.html#_new_knn_search_api). It adds support for natural language processing (NLP) models directly into Elasticsearch. Users can now perform named entity recognition, sentiment analysis, text classification, and more directly in Elasticsearch — without requiring additional components or coding. Elasticsearch v8.x also includes native support for [approximate nearest neighbor (ANN)](http://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0) search — making it possible to compare vector-based queries with a vector-based document corpus with speed and at scale.
## Proposal
Any Elasticsearch upgrade will require coordination with community and CSPs. This can be very time consuming. Instead of just upgrading Elasticsearch to latest v7.17.x, we should upgrade to v8.10.0 (or the highest released v8.x) to minimize the disruption and repeat this step very soon. Since the last major version of Elasticsearch (v8) was released 18 months ago, once v9 is released, the entire v7 (v7.17.x) family will be deprecated, as stated in the [support documentation](https://www.elastic.co/support/eol).
We should breakdown upgrade into two parts:
#### Latest v7.17.x Upgrade
1. Take back up (snapshot) of the data. We cannot roll back to an earlier version unless we have snapshot.
1. Upgrade Elasticsearch server to latest v7.17.13 (or highest available v7.17.x).
1. Replace Indexer & Search services [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) client SDK with new [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html). Elasticsearch has [changed](https://www.elastic.co/pricing/faq/licensing) **Java high level rest API** client SDK's license in v7.10.2 from Apache 2.0 to [SSPL](https://www.mongodb.com/licensing/server-side-public-license). New license is not preffered license for OSDU Data Platform as explained in the [issue](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/133).
[Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) with [Apache 2.0](https://github.com/elastic/elasticsearch-java/) license is available v7.15.0 onwards (including v.8.x). Along similar timeline [Java high level rest API](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) has been [deprecated](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html) in favor of [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html).
#### Latest v8.10.x Upgrade
1. Complete client and server upgrade to latest v7.17.x as stated above.
1. Use the Kibana [Upgrade Assistant](https://www.elastic.co/guide/en/kibana/7.17/upgrade-assistant.html) to prepare for upgrade from v7.17 to v8.10.0. The Upgrade Assistant identifies deprecated settings and guides users through resolving issues.
1. Review the deprecation logs from the Upgrade Assistant.
1. Review breaking changes including breaking changes for each minor v8.x release up to v8.10.0.
1. Make the recommended changes to ensure that applications/APIs continue to operate as expected after the upgrade.
1. Take a current snapshot before server upgrade is started.
1. Upgrade Elasticsearch server to latest v8.10.0 (or highest available v8.x).
1. Upgrade Indexer and Search service [Java API client SDK](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) to v8.10.0 (or highest available v8.x).
Elasticsearch upgrade recommendations from v7.x to v8.x can be found [here](https://www.elastic.co/guide/en/elastic-stack/8.10/upgrading-elastic-stack.html#prepare-to-upgrade).https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/66Reindex API - performance, scalability and reliability issues2023-12-18T16:13:23ZNeelesh ThakurReindex API - performance, scalability and reliability issuesRecent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/...Recent issues on Schema/Search backend requires us to re-index significant number of kinds/indices. Here are specifics on these issue:
- M10 schema [hints changes](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/ChangeReport.md#snapshot-2021-11-09-towards-m10) on Schema service.
- Geoshape queries are broken when Elasticsearch server upgraded from 7.8.1 --> 7.17.x (Confirmed by Elasticsearch Support team, public issue is not available)
Current implementation of Reindex API (per kind) has serious performance, scalability and reliability issues. It does not work at all for kind with few million records. This is blocking us from adopting M10 (now M11) schema updates. Following list summarizes issues with API:
- API throughput is pretty slow and it can only re-index 250K-300K records per hour. In case of partition with 100 million records, this can run over 2 weeks.
- It’s not resilient, if operation fails in the middle, we have to start over.
- There is no transparency for Reindex operation, we don’t know how much progress has been made.
In addition to above issues, we cannot recover Search service in Disaster recovery scenarios as well. In this case, we can use ReindexAll API which use Reindex API (per kind) behind the scene. We run into to all of above issues.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/153Indexer is not supporting 64-bit integer value2024-03-23T16:00:07ZAn NgoIndexer is not supporting 64-bit integer valueA bug was submitted for case when a seismic volume size did not get indexed. The [AbstractDataset ](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractDataset.1.0.0.json?ref_type=heads#L26...A bug was submitted for case when a seismic volume size did not get indexed. The [AbstractDataset ](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractDataset.1.0.0.json?ref_type=heads#L26)definition (and a few more [attributes](https://community.opengroup.org/search?group_id=218&nav_source=navbar&project_id=91&repository_ref=master&scope=blobs&search=convertible+to+a+long+integer+extension:json+path:Authoring&search_code=true)) states that the value must be converted to a long integer. But it seems the Indexer only handles 32-bit integer values.
Proposal to fix:
* declare that the default schema definition for "int" is a 64-bit value, which will increase storage and processing and require a re-index on potentially all ingested data.
* create a new schema type "long int" that supports 64-bit value, update the existing schema definition for just the attributes that may exceed 32-bit size, and re-index the affected data.
Screen shots of error and data value:
![storage.png](/uploads/0d9a6215309c8164f647bfd7d657bcf8/storage.png)
![indexing_error.png](/uploads/5f8752f09cbb2e5c9c93595805a87600/indexing_error.png)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/148Augmented Index - Provide new endpoint, or add info inside existing endpoint ...2024-02-09T01:01:51ZDebasis ChatterjeeAugmented Index - Provide new endpoint, or add info inside existing endpoint - is the Feature flag enabled or disabled?We need simple way for end users to know if the related "feature flag" is set on in the instance or not.
Today, there are similar examples from "Version info" endpoint and Status/Health endpoints of some DDMS's.
A.
Example of “info” end...We need simple way for end users to know if the related "feature flag" is set on in the instance or not.
Today, there are similar examples from "Version info" endpoint and Status/Health endpoints of some DDMS's.
A.
Example of “info” endpoint.
GET {{INDEXER_HOST}}/info
```
{
"groupId": "org.opengroup.osdu.indexer",
"artifactId": "indexer-aws",
"version": "0.25.1",
"buildTime": "2024-01-24T20:28:16.495Z",
"branch": "refs/heads/release/r3-m22",
"commitId": "07ad22b2308a75e018a0f9f72c579afb66f7928a",
"commitMessage": "merge from gitlab tag",
"connectedOuterServices": [
{
"name": "ElasticSearch-osdu",
"version": "7.17.15"
},
{
"name": "ElasticSearch-common",
"version": "7.17.15"
}
]
}
```
B.
Some cases, we also see SOH (State of health) style option.
Seismic DDMS V4 API
GET {{osduonaws_base_url}}/seistore-svc/api/v4/status/readiness
```
{
"ready": true
}
```
GET {{osduonaws_base_url}}/seistore-svc/api/v4/status
```
{
"status": "running"
}
```
cc @zhibinmaihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/141Indexing issue with GeoJSON structure in Spatial block2024-01-22T15:55:55ZJeyakumar DevarajuluIndexing issue with GeoJSON structure in Spatial blockeds_ingest dag use a search service to get records from the source system(OSDU/NON OSDU) and ingest them into a target OSDU system using osdu_ingest.
Search service provides the flattened "SpatialLocation.Wgs84Coordinates" and it gets i...eds_ingest dag use a search service to get records from the source system(OSDU/NON OSDU) and ingest them into a target OSDU system using osdu_ingest.
Search service provides the flattened "SpatialLocation.Wgs84Coordinates" and it gets ingested using osdu_ingest, but there is an indexing issue.
Troubleshooting reveals that Spatial data is not indexed since GeoJSON syntax is incorrect.
**"geo-json shape parsing error: must be a valid FeatureCollection attribute: SpatialLocation.Wgs84Coordinates",**
Looks like string fields are displayed even though it is flattened, but not array-like Wgs84Coordinates??. There may be a problem with indexing in handling array-like structure?
There are other flattened fields displayed in the search query,
![Search.png](/uploads/ffc79525dc9b51a2a93649a26303dc31/Search.png)
Storage shows the Wgs84Coordinates
![Storage.png](/uploads/f8c1fb09b468de6a4fe33400fe57999b/Storage.png)
CC: @debasisc @AshishSaxenaAccenture @chad
Sample Seismic Trace data
[SeismicTraceData](/uploads/4b69869d4c94d32e6f044b6c320c8b30/SeismicTraceData)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/79Reindexing timing out2023-06-21T18:03:38ZOkoun-Ola Fabien HouetoReindexing timing outIn addition to issues in #66, it seems that the reindexer is user the user token and therefore reindexing will time out when the user token expires. Reindexer should use mechanism to avoid timing out on the token.
For example, we are get...In addition to issues in #66, it seems that the reindexer is user the user token and therefore reindexing will time out when the user token expires. Reindexer should use mechanism to avoid timing out on the token.
For example, we are getting this error
"_The user is not authorized to perform this action, errors=null, debuggingInfo=account id: null | user email: admin@testing.com_"Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/73Indexer fails to correctly parse properties with special characters2022-08-23T15:08:44ZAn NgoIndexer fails to correctly parse properties with special charactersFor example:
```
"SpatialArea": {
"Wgs84Coordinates": {
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
2.2863,
61.198685
...For example:
```
"SpatialArea": {
"Wgs84Coordinates": {
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
2.2863,
61.198685
]
},
"properties": {
"id": "a:b"
},
"type": "Feature"
}
],
"type": "FeatureCollection"
}
}
```
Indexer fails to parse the properties id whose value contains a colon.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/16ADR: Array of Objects support by Indexer2021-06-14T16:29:33ZDmitriy RudkoADR: Array of Objects support by Indexer## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
Currently, Indexer and Search implementation ignores Arrays of Objects structures in schemas. One of the examples of such structure in D...## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
Currently, Indexer and Search implementation ignores Arrays of Objects structures in schemas. One of the examples of such structure in DD schemas are [WellLog.Curves](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Examples/work-product-component/WellLog.1.0.0.json#L303)
You can find historical context in the following issues:
- https://community.opengroup.org/osdu/platform/system/home/-/issues/67
- https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/1
- https://community.opengroup.org/osdu/platform/system/schema-service/-/issues/1
- https://community.opengroup.org/osdu/platform/system/search-service/-/issues/1
As a result, users are not able to:
- Retrieve such objects using Search (fixed in MR !114)
- Do search against info in such objects
- Do complex search queries e.g range / spatial etc
Elasticsearch has several options and data types to handle such cases. There is no silver bullet and each of these types has pros and cons.
- [object type](https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html) - Individual object queries are not supported ([objects are not searchable individually](https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html))
- [nested type](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) - Have a serious impact on performance and produce separate documents e.g. 100 WellLog records with 100 Curves each will produce 100x100 documents under Elastic index.
- [flattened type](https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html) - Do not support complex queries and available **ONLY** with X-Pack (Not open-source type)
A specific object might be treated in one of these 3 ways according to a custom hint associated with that object (part of the logical schema). The main question is how the initial DD schema enrichment mechanism would work to add the hints. Generally, there are 2 options:
1. Incorporate hints into the DD schema.
2. Separate DD schema and hints and merge them during the Elasticsearch index creation process.
Both options have pros and cons:
1. Combined schema and hints are the easiest and centralized way to onboard new functionality. Drawbacks are the blended "clean" data model with functional-level attributes and changes of the DD schema.
2. Separated data model and hints follow the separation of concerns principle, but bring other challenges such as maintaining the conformity of 2 physically separated pieces and the necessity to store hints per provider.
The format of the hints file might be one of the following:
1. Copy the original JSON structure and store hints maintaining the original DD schema hierarchy:
`{ "properties" : { "data": { "allOf" : { "properties" : { "Curves" : "x-osdu-flattened" } } } } }`
2. Maintain a path to the desired field and its type:
`{ "properties/data/allOf/properties/Curves" : "x-osdu-flattened" }`
3. Assuming a type is always indexing in some way, the mapping might be done on an object level:
`{ "Curves" : "x-osdu-flattened" }`
In each case, the maintainer should make sure the hints file matches the DD file (e.g. DD schema changes), which adds operational overhead.
## Scope
Implement a general approach on how to handle Arrays of Objects (AO) in Schemas:
- Index
- Search
## Decision
Analyzing 2 options, decided to follow the combined schema and hints approach:
1. Define generic hints (enriched object schema), which will let the Indexer know, how the specific array of objects should be treated when feeding a schema into Elasticsearch:
- `"x-osdu-indexing": "x-type-nested"`
- `"x-osdu-indexing": "x-type-flattened"`
- `"x-osdu-indexing": "x-type-object"`
2. Review R3 schemas and inject hints where applicable.
## Rationale
Elasticsearch doesn't have a type that can be used across all array of objects elements.
The decision on a proper data type should be done as a part of the Data Modeling phase.
The following criteria should be taken into account in each DD case:
- Is object attributes will be used in queries?
- What type of queries will be used?
- What is the cardinality of AO?
- Can information be moved out of AO?
## Consequences
- Review R3 schemas
- Update all cases where we have AO with appropriate meta attribute
- Educate the DD community on the pros and cons of each typehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/156Augmented Index - parent-child use case - trailing colon (":") in relationshi...2024-03-08T16:00:02ZDebasis ChatterjeeAugmented Index - parent-child use case - trailing colon (":") in relationship field in child recordSee my test case in AWS/M22/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/Test_plan_Results_M22/Core%20Services/M22-AWS-Augmented-Index-parent-child-steps-Debasis.docx
This test case involves "We...See my test case in AWS/M22/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/Test_plan_Results_M22/Core%20Services/M22-AWS-Augmented-Index-parent-child-steps-Debasis.docx
This test case involves "Well" (parent) and "Wellbore" (child).
When troubleshooting with @zhibinmai , we tried by removing trailing colon from relationship field of child record Wellbore.
With that change, I could get clean run and "virtual field" appears in search response as expected.
In child Wellbore record -
"WellID": "osdu:master-data--Well:WELL07MARDC"
whereas it should be
"WellID": "osdu:master-data--Well:WELL07MARDC:"
(this is the convention when referencing field from another entity - like "foreign key" relationship)
I earlier ran similar test case in Azure and did not see the impact of trailing colon there.
@ydzeng - how to ensure feature parity between version of code (for augmented index) in M22/AWS/Preship with other CSPs?
cc @chad and @sjtomlinsonhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/150records with id longer then 512 bytes cannot be indexed2024-02-19T18:16:31ZNeelesh Thakurrecords with id longer then 512 bytes cannot be indexedAny Storage record with id longer then 512 bytes is not searchable. Search/Indexer backend (Elasticsearch) throws error when id is longer the 512 bytes. Following message is logged by Indexer service:
indexer.app Validation Failed: 1: i...Any Storage record with id longer then 512 bytes is not searchable. Search/Indexer backend (Elasticsearch) throws error when id is longer the 512 bytes. Following message is logged by Indexer service:
indexer.app Validation Failed: 1: id [xxx-yyy-zzzzzzz-corporation:work-product-component--WellLog:xxx-yyy-zzzzzzz-corporation:work-product-component--WellLog:CRXdAWD15bqEJ8kNtJe6V3RKXmSQzmohsYZDhe7QdR58iFGHOA0b5Otuc96XDgp34TNCk851FsKB95zHx7QazeBIG0NxT3CVDLyWpEe0nyXGgDMY2k1RR1SXzum4IqMajpscNM6kVjRlBjh2Cx2ZGDt7RW0AKYEemm8IpU1kvWRgjYATXJacoDivlQJqJ07Ghzco4MOu2TYFDq31qfnpVP37E2pktUGvHug1qQoVSHaSoT4zQgOiOF1WXMfZWTPIlaRdnaUSbjN2aXgH9zMlSebOkJ4J0SAU9lMs58QJsSvMoL9bjaBmniVNq2os41oyL3gZrBucz2yI67Yzm72y72fb7swlBoiONveLgyTra2fY8q9btfxjGYDPO71dwA1akgNmJerCDdE]
is too long, must be no longer than 512 bytes but was: 517
This breaks end user discovery workflows. Core services must resolve this issue so all ingested records are discoverable by Search service.M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/149Please check list of required roles for Reindexing task2024-02-12T06:15:40ZDebasis ChatterjeePlease check list of required roles for Reindexing taskhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#indexer-api-access
Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Us...https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#indexer-api-access
Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Users must be a member of `users.datalake.viewers` or `users.datalake.editors` or `users.datalake.admins` or `users.datalake.ops`,
"or" indicates that reindexing should work well if the user is assigned one of the above roles.
I am experiencing failure although the user is a member of the groups -
**users.datalake.viewers**
and
**users.datalake.editors**
Also can you see this role?
service.indexer.admin
Is this a requirement?
POST {{INDEXER_HOST}}/reindex?force_clean=true
Body
```
{
"kind": "osdu:wks:master-data--Play:1.0.0"
}
```
Response
```
{
"code": 401,
"reason": "Unauthorized",
"message": "The user is not authorized to perform this action"
}
```
cc @chad for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/147Augmented Index - want to provide "id" in ValueExtraction.ValuePath"2024-02-05T05:53:32ZDebasis ChatterjeeAugmented Index - want to provide "id" in ValueExtraction.ValuePath"For reference-data--IndexPropertyPathConfiguration -
Currently, it supports providing fields only from within "data" block only.
In my use case (Wellbore->WellLog : Parent->Child. Hence "RelationshipDirection": "ParentToChildren"),
we ...For reference-data--IndexPropertyPathConfiguration -
Currently, it supports providing fields only from within "data" block only.
In my use case (Wellbore->WellLog : Parent->Child. Hence "RelationshipDirection": "ParentToChildren"),
we want to create a derived field (augmented index) of all child record IDs.
So, if 4 WellLog records are linked to one parent Wellbore, then the derived field should show a value of
[WellLog-record-ID1, WellLog-record-ID2, WellLog-record-ID3, WellLog-record-ID4]
Let me know if there is any question about this use case.
Thank you
cc @zhibinmaihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/140Indexing Records of same Kind but different case for "kind" results in undesi...2024-01-22T15:54:41ZSabarish K R EIndexing Records of same Kind but different case for "kind" results in undesired behaviour.Assume a new schema is created with Kind: `osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0`
Next, a storage record is PUT, with the value for kind as "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Notice the Uppe...Assume a new schema is created with Kind: `osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0`
Next, a storage record is PUT, with the value for kind as "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Notice the Upper case OSDU.
When indexing, the following happens:
- Indexer service does a GET request to schema for the kind "**OSDU**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Schema is NOT found. (due to upper case)
- Indexer service still goes ahead, as per design, creates an elasticsearch index "**osdu-wks-reference-data--velocityanalysismethodx1-1.0.0**", and this index's mapping, for the field **authority**, the allowed value is set as constant "OSDU" (in uppercase, as derived from storage record's "kind" field). Then the record is indexed in this index, with ``` "trace": ["schema not found"],``` as the reason for the data fields not being indexed.
Now this causes two major issues:
- Legitimate records with kind **osdu**:wks:reference-data--VelocityAnalysisMethodX1:1.0.0 will not get indexed, because the elasticsearch index for this kind is already created with "authority" field allowed to have only the constant value **OSDU**, (hence cant accept **osdu**).
- The mapping also would have got created with no details about data fields due to schema not found the first time (as discussed earlier). This will cause the data fields of the storage record to NOT get indexed. and hence, these records won't be searchable.
This happens because elasticsearch index is created by converting kind string to lowercase, so two records with logically the same kinds, but different CASE, will have these conflicts during indexing.
(index name = lowercase(kind), and replace : with - )
To solve this, we need to design a strategy to handle different casing of the meta attributes like "kind"/"authority" appropriately.M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/122IndexMapping not updated with AsIngestedCoordinates fields2024-01-15T11:55:41ZKonrad KrasnodebskiIndexMapping not updated with AsIngestedCoordinates fieldsCurrent AsIngestedCoordinates feature implementation updates Index mapping regarding what AsIngestedCoordinates fields occur in record. Index mapping mechanism have caching functionality which check whether mapping was synced. This mecha...Current AsIngestedCoordinates feature implementation updates Index mapping regarding what AsIngestedCoordinates fields occur in record. Index mapping mechanism have caching functionality which check whether mapping was synced. This mechanism could disrupt mapping update for various records with the same kind.
Related MR [!650 (merged)](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/121Review IndexerMappingServiceImpl2024-01-08T12:04:59ZMark ChanceReview IndexerMappingServiceImplReview comments from MR https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650Review comments from MR https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/650M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/119ADR: Delete API endpoint to delete index for all kinds.2024-01-25T13:21:20ZAkshat JoshiADR: Delete API endpoint to delete index for all kinds.## Status
- [X] Proposed
- [] Under review
- [] Approved
- [] Retired
## Context & Scope
The ADR is centered around the adding the capability of performing the deletion of elastic search index for all kinds per call in existing Delete i...## Status
- [X] Proposed
- [] Under review
- [] Approved
- [] Retired
## Context & Scope
The ADR is centered around the adding the capability of performing the deletion of elastic search index for all kinds per call in existing Delete index API in indexer service.
## Decision
Currently the delete API introduce as the part of this ADR -https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/54 supports only the deletion of the single index for given kind. As part of Replay Design,ADR - https://community.opengroup.org/osdu/platform/system/storage/-/issues/186 , user will may require deleting all the indices in use case of the reindex instead of overwriting the indices. As mentioned in this flow - <br>
![replayAll](/uploads/70dd44c84d985e56148ac84930fa3bd9/replayAll.png) <br><br>
## API Details <br>
**API Level Permission** - users.datalake.ops <br>
**Service** – Indexer
<b>delete API in indexer service</b>.
Sample request:
```bash
curl --request DELETE \
--url '/api/indexer/v2/index' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes'
```
<br><br>
**Current Scenario vs New Scenario of Delete Index API in Indexer Service**
<table>
<tr>
<td><strong> </strong>
</td>
<td><strong> Existing Scenario</strong>
</td>
<td><strong> New Scenario</strong>
</td>
</tr>
<tr>
<td><strong> API Method</strong>
</td>
<td> Delete
</td>
<td> Delete
</td>
</tr>
<tr>
<td><strong>Endpoint supported</strong>
</td>
<td><strong>indexer/v2/index? kind=”tenant1:public:well:1.0.2“</strong>
<p>
</td>
<td>- <strong> indexer/v2/index? kind=” tenant1:public:well:1.0.2“</strong> -it will delete single kind
<p>
- <strong>indexer/v2/index</strong> – It will delete all kinds. (new endpoint)
</td>
</tr>
<tr>
<td><strong>Backward Compatible</strong>
</td>
<td>NA
</td>
<td>Yes
</td>
</tr>
<tr>
<td><strong>New Functionality</strong>
</td>
<td>NA
</td>
<td>It will allow you to delete all the indices.
</td>
</tr>
<tr>
<td><strong>API level change</strong>
</td>
<td>Currently kind should be non-blank parameter
</td>
<td> Will remove nonblank parameter from kind.
</td>
</tr>
<tr>
<td><strong>Code change required?</strong>
</td>
<td>NA
</td>
<td>Yes, backend code change is required to support the deletion of all kinds of indices.
</td>
</tr>
<tr>
<td><strong>API Response </strong>
</td>
<td> Same
</td>
<td> Same
</td>
</tr>
</table>
## Consequences
- This will provide user with the capability to delete index for all kinds.Akshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/117Indexing error stays in os-indexer log file and is not visible for end users2023-11-27T13:00:03ZDebasis ChatterjeeIndexing error stays in os-indexer log file and is not visible for end usersWe had a problem with content of spatial block in SeismicAcquisitionSurvey and the record was not getting indexed properly (missed Spatial information)
.
We could find the reason only from **os-indexer log file**.
But the usual troubles...We had a problem with content of spatial block in SeismicAcquisitionSurvey and the record was not getting indexed properly (missed Spatial information)
.
We could find the reason only from **os-indexer log file**.
But the usual troubleshooting steps did not show any of the errors.
Tried using Storage service (record batch) and also Search service (returned Field ID, Index).
Do we need to use any other parameter? I tried frame-of-reference with value of
units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;
After we found out the reason (need to close the polygon), we could fix the record and re-ingest. Then indexing went through properly.
The problem is that average end user does not have access to Indexer log.
cc: @zhibinmai for information
2023-11-08 21:48:32.386 WARN 19 --- [io-8080-exec-35] o.o.o.c.common.logging.DefaultLogWriter : indexer.app: 0: elasticsearch bulk service status: BAD_REQUEST | id: devel:master-data--SeismicAcquisitionSurvey:ST12005D12 | message: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [data.SpatialLocation.Wgs84Coordinates] of type [geo_shape]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=[1:1037] [geojson] failed to parse field [geometries]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Failed to build [geojson] after last required field arrived]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=first and last points of the linear ring must be the same (**it must close itself**): x[0]=2.3651712345204428 x[15]=1.929596222841967 y[0]=61.232218587912584 y[15]=61.23097891915874]];
1: elasticsearch bulk service status: BAD_REQUEST | id: devel:master-data--SeismicAcquisitionSurvey:ST12005D17 | message: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [data.SpatialLocation.Wgs84Coordinates] of type [geo_shape]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=[1:1037] [geojson] failed to parse field [geometries]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Failed to build [geojson] after last required field arrived]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=first and last points of the linear ring must be the same (it must close itself): x[0]=2.3651712345204428 x[15]=1.929596222841967 y[0]=61.232218587912584 y[15]=61.23097891915874]];
{correlation-id=b791d3fa-49ae-4eb1-b8c1-9f498451e06a, data-partition-id=devel}https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/116/reindex/records API returns 500 if all given records are found2023-10-05T10:17:30ZMingyang Zhu/reindex/records API returns 500 if all given records are found/reindex/records was introduced by the ADR: https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/90.
The code has a bug of shallow copy, which causes an exception if there is no not-found records and therefore r.../reindex/records was introduced by the ADR: https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/90.
The code has a bug of shallow copy, which causes an exception if there is no not-found records and therefore return 500.Mingyang ZhuMingyang Zhuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/115normalizedKind tag is not indexed if the storage record already had tags2023-09-26T09:53:18ZMingyang ZhunormalizedKind tag is not indexed if the storage record already had tagsnormalizedKind tag is not indexed if the storage record already had tagsnormalizedKind tag is not indexed if the storage record already had tagsMingyang ZhuMingyang Zhu