Search issueshttps://community.opengroup.org/osdu/platform/system/search-service/-/issues2024-03-08T10:40:15Zhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/158Feature 'featureFlag.autocomplete.enabled' does not work.2024-03-08T10:40:15ZRiabokon Stanislav(EPAM)[GCP]Feature 'featureFlag.autocomplete.enabled' does not work.The GC team initiated testing for a new feature 'featureFlag.autocomplete.enabled.' Following the documentation guidelines, we configured the 'featureFlag.bagOfWords.enabled' flag with a value of 'true' on the Indexer Service and set 'fe...The GC team initiated testing for a new feature 'featureFlag.autocomplete.enabled.' Following the documentation guidelines, we configured the 'featureFlag.bagOfWords.enabled' flag with a value of 'true' on the Indexer Service and set 'featureFlag.autocomplete.enabled' to 'true' as well. Unfortunately, the integration test did not yield the expected results.
To investigate the issue further, we carefully examined the index from the Elasticsearch.
```
{
"osdu-search1709032988256-test-data--integration-1.0.1": {
"aliases": {
"a1632179934": {
},
"a1632185707": {
}
},
"mappings": {
"dynamic": "false",
"properties": {
"acl": {
"properties": {
"owners": {
"type": "keyword"
},
"viewers": {
"type": "keyword"
}
}
},
"ancestry": {
"properties": {
"parents": {
"type": "keyword"
}
}
},
"authority": {
"type": "constant_keyword",
"value": "osdu"
},
"bagOfWords": {
"type": "text",
"store": true,
"fields": {
"autocomplete": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
},
"createTime": {
"type": "date"
},
"createUser": {
"type": "keyword"
},
"data": {
"properties": {
"Basin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Center": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"County": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"DblArray": {
"type": "double"
},
"EmptyAttribute": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Established": {
"type": "date"
},
"Field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Location": {
"type": "geo_point"
},
"OriginalOperator": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Rank": {
"type": "integer"
},
"Score": {
"type": "integer"
},
"State": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellStatus": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
}
}
},
"id": {
"type": "keyword"
},
"index": {
"properties": {
"lastUpdateTime": {
"type": "date"
},
"statusCode": {
"type": "integer"
},
"trace": {
"type": "text"
}
}
},
"kind": {
"type": "keyword"
},
"legal": {
"properties": {
"legaltags": {
"type": "keyword"
},
"otherRelevantDataCountries": {
"type": "keyword"
},
"status": {
"type": "keyword"
}
}
},
"modifyTime": {
"type": "date"
},
"modifyUser": {
"type": "keyword"
},
"namespace": {
"type": "keyword"
},
"source": {
"type": "constant_keyword",
"value": "search1709032988256"
},
"tags": {
"type": "flattened"
},
"type": {
"type": "keyword"
},
"version": {
"type": "long"
},
"x-acl": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"refresh_interval": "30s",
"number_of_shards": "1",
"provided_name": "osdu-search1709032988256-test-data--integration-1.0.1",
"creation_date": "1709032992807",
"number_of_replicas": "1",
"uuid": "rpIKCM9NRmm3gb41_7algw",
"version": {
"created": "7171799"
}
}
}
}
}
```
As far as we can determine, the Indexer has introduced a new block:
```
"bagOfWords": {
"type": "text",
"store": true,
"fields": {
"autocomplete": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
```
Acknowledged that the implementation is in accordance with the documentation.
The new request to the Search Service has been reviewed.
```
{
"offset":0,
"kind":"osdu:search1709032988256:test-data--Integration:1.0.1",
"limit":0,
"query":"data.OriginalOperator:OFFICE4",
"suggestPhrase":"data",
"returnHighlightedFields":false,
"highlightedFields":[
],
"returnedFields":[
],
"queryAsOwner":false,
"trackTotalCount":false
}
```
A request from the Search Service to the Elasticsearch:
`SearchRequest{searchType=QUERY_THEN_FETCH, indices=[osdu-search1709032988256-test-data--integration-1.0.1,-.*,-system-meta-data-*], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":10,"timeout":"1m","query":{"bool":{"must":[{"bool":{"must":[{"query_string":{"query":"data.OriginalOperator:OFFICE4","fields":[],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"allow_leading_wildcard":false,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}],"filter":[{"terms":{"x-acl":["data.test-users-data-root`
truncated...
Notably, the recent request from the Search Service to Elasticsearch lacks a field identified as 'autocomplete.'
Additionally, we have identified a method, org.opengroup.osdu.search.util.SuggestionsQueryUtil#getSuggestions, which seems to contain the logic related to suggestions.
```
public SuggestBuilder getSuggestions(String suggestPhrase) {
if (!autocompleteFeatureFlag.isFeatureEnabled(AUTOCOMPLETE_FEATURE_NAME) || suggestPhrase == null || suggestPhrase == "") {
return null;
}
SuggestionBuilder suggestionBuilder = SuggestBuilders.completionSuggestion(
"bagOfWords.autocomplete"
).text(suggestPhrase).skipDuplicates(true);
SuggestBuilder suggestBuilder = new SuggestBuilder();
suggestBuilder.addSuggestion(SUGGESTION_NAME, suggestionBuilder);
return suggestBuilder;
}
```
I suppose this method can be used when we create a request to Elastic Search, but this method will be run ONLY Junit tests.
To sum up, this feature 'featureFlag.autocomplete', perhaps, has not been implemented. Please, play an attention for it.M23 - Release 0.26Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/127Wildcard Searching not consistent in nested fields2024-02-01T15:22:13ZMark ChanceWildcard Searching not consistent in nested fields```
Various queries of nested fields do not return expected results. For instance, with this context:
{"kind":"osdu:wks:master-data--Well:1.2.0",
"offset":0,"limit":30}
WORKS:
"query":"nested(data.FacilityStates, (FacilityStateType...```
Various queries of nested fields do not return expected results. For instance, with this context:
{"kind":"osdu:wks:master-data--Well:1.2.0",
"offset":0,"limit":30}
WORKS:
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference*))",
Returns 72, including:
"data": {
"FacilityStates": [
{
"FacilityStateTypeID": "osdu:reference-data--FacilityStateType:Abandoned:",
"Remark": null
},
{
"FacilityStateTypeID": "osdu:reference-data--FacilityStateType:Planning:",
"Remark": null
}
],
FAILS:
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference-data*))",
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference?data*))",
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference\\-data*))",
For what it's worth, the schema has "FacilityStateTypeID": { "type": "string",...
And another example:
{"kind":"osdu:wks:master-data--Wellbore:1.0.0","offset":0,"limit":30}
returns 2453, including
"data": {
"GeoContexts": [
{
"BasinID": null,
"FieldID": "osdu:master-data--Field:Tietjerksteradeel:",
"PlayID": null,
"GeoPoliticalEntityID": null,
"GeoTypeID": "Field",
"ProspectID": null
}
],
...
Works:
"query":"nested(data.GeoContexts, (FieldID:osdu?master*))",
returns 1140
Fails:
"query":"nested(data.GeoContexts, (FieldID:osdu?master?data*))",
"query":"nested(data.GeoContexts, (FieldID:osdu?master\\-*))",
Additional Examples
WORKS:
"query":"nested(data.GeoContexts, (FieldID:\"osdu:master-data--Field:Tietjerksteradeel\"))",
"query":"nested(data.GeoContexts, (FieldID:\"osdu\\:master\\-data\\-\\-Field\\:Tietjerksteradeel\"))",
"query":"nested(data.GeoContexts, (FieldID:osdu\\:master\\-data\\-\\-Field\\:Tietjerksteradeel))",
returns 8
"query":"nested(data.GeoContexts, (FieldID:osdu*))",
returns 1162
And again, the schema has "FieldID": { "type": "string",
The original AHA Link is https://osdu-community.ideas.aha.io/ideas/IDEA-I-68
These queries have been run on a Shell-deployed instance on AWS:
"artifactId":"search-aws",
"version":"0.19.2",
"buildTime":"2023-03-20T22:58:41.497Z",
"branch":"refs/heads/release/r3-m16",
"commitId":"f8549673fca69422a024c9c980a36b22a445ca1e",
"commitMessage":"Change unit test to use older version",
```M16 - Release 0.19https://community.opengroup.org/osdu/platform/system/search-service/-/issues/106ADR: Additional attribute in Sort query to filter the records2024-01-15T11:54:09ZMandar KulkarniADR: Additional attribute in Sort query to filter the recordsAdditional attribute in Sort query to filter the records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The search service accepts sort query as of now where the caller can s...Additional attribute in Sort query to filter the records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The search service accepts sort query as of now where the caller can specify 2 attributes.
- field : This is a list of fields to sort the results.
- order : This is a list of orders to sort the results. This requires fix value as either ASC or DESC.
More can be read from this search [documentation](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md#sort).
The SortQuery model supported in the search can be seen [here](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/search/SortQuery.java).
Elasticsearch supports filter attribute as well inside the sort field to filter the objects inside nested path, please refer the ES documentation [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html#_nested_sorting_examples).
In the absence of support for 'filter' attribute in OSDU search, we have a limitation to sort results of nested fields as mentioned in this [issue](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/101).
## Tradeoff Analysis
This will be a non-breaking change and a feature supported in Elasticsearch will be supported with OSDU search as well.
## Decision
The /query and /query_with_cursor APIs in search service accepts a SortQuery model like below:
```
SortQuery:
type: object
properties:
field:
type: array
description: 'The list of fields to sort the results.'
items:
type: string
order:
type: array
description: 'The list of orders to sort the results. The element must be either ASC or DESC.'
items:
type: string
```
The proposal is to add an **optional** new attribute called 'filter' in the SortQuery model in OSDU search service. The filter string passed by the caller will be passed to underneath Elasticsearch.
If the filter string is not passed in the incoming request, existing behavior shall be maintained.
The 'filter' string will be used to filter records while searching and sorting the records based on attributes inside nested objects.
```
SortQuery:
type: object
properties:
field:
type: array
description: 'The list of fields to sort the results.'
items:
type: string
order:
type: array
description: 'The list of orders to sort the results. The element must be either ASC or DESC.'
items:
type: string
filter:
type: array
description: 'A filter that the inner objects inside the nested path should match with, in order for its field values to be taken into account by sorting.'
items:
type: string
```
Below is a sample query to sort records of kind osdu:wks:master-data--Wellbore:1.0.0.
The sorting is based on the values from VerticalMeasurements array where the VerticalMeasurementTypeID matches with the value in filter.
```
{
"kind": "osdu:wks:master-data--Wellbore:1.0.0",
"sort": {
"field": [
"nested(data.VerticalMeasurements, VerticalMeasurement, min)"
],
"order": [
"ASC"
],
"filter":["nested(data.VerticalMeasurements, VerticalMeasurementTypeID:\"tenant1:reference-data--VerticalMeasurementType:KB\", match)"]
}
}
```
This query will be transformed into an elastic search query like below:
```
"sort" : [
{
"data.VerticalMeasurements.VerticalMeasurement" : {
"mode" : "min",
"order" : "asc",
"nested": {
"path": "data.VerticalMeasurements",
"filter": {
"match" : { "data.VerticalMeasurements.VerticalMeasurementTypeID" : "tenant1:reference-data--VerticalMeasurementType:KB" }
}
}
}
}
]
```
The path for the filter would be taken from the nested field inside the 'filter'.
## Consequences
- Change in core-common to update the [SortQuery](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/search/SortQuery.java) model.
- Change in [search-core](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/util/SortParserUtil.java#L94) to set filter in nested sort queries.
- Search service documentation and Open API specs need to be updated.M20 - Release 0.23Chad LeongMark ChanceChad Leonghttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/153LQCIndicator.1.0.0 Custom Reference Schema OSDU Search API Issue results in W...2024-01-09T12:13:57ZVedantha Sampekai SampekaiLQCIndicator.1.0.0 Custom Reference Schema OSDU Search API Issue results in WellLogTypeID field not appearing in search result where data is available at storageHello OSDU Forum Team,
I from Shell LQC OSDU Migration Project reaching out to you regarding an OSDU Search Issue facing with an LQC Custom Reference Schema/Data type. Jake from DP team (Jake.J.Pearce@shell.com) is working on this issue...Hello OSDU Forum Team,
I from Shell LQC OSDU Migration Project reaching out to you regarding an OSDU Search Issue facing with an LQC Custom Reference Schema/Data type. Jake from DP team (Jake.J.Pearce@shell.com) is working on this issue, he suggested us to raise this as an issue with OSDU Forum to get some assistance on this. I have explained the whole issue in step-by-step as below, hope it helps you in understanding it.
In case if need any more information this issue, I am happy to explain the issue in detail over a call. Please share me the required associate's names and your convenient time to setup a call. My Email-id: vedantha.gowda@shell.com
PFA LQCIndicator Schema JSON file for your reference.
[LQCIndicator_Custom_Reference_Schema.json](/uploads/26ecd1320e94666823cfb60a70ebb55b/LQCIndicator_Custom_Reference_Schema.json)
**Quick Summary of the issue**
Field "WellLogTypeID" displaying "null" for LQCIndicator:1.0.0 custom reference schema when results are returned for the kind via the Search API. The custom schema is referencing OSDU public data.
**Please find the below details regarding the issue,**
- LQCIndicator.1.0.0 is an LQC Custom Reference Schema/Data Type designed and approved from Shell Data Architect team.
- DP Team has ingested/registered the schema in all OSDU Environments – US Instance.
- LQC Team prepared the reference data as per the business requirement and handed over it to RDM Team in an Excel Sheet to ingest into LQCIndicator Schema.
- RDM Team has built an Informatica Pipeline, prepared an manifest payload and ingested the reference data into LQCIndicator Schema in OSDU Acceptance Environment US Instance.
- RDM team were using OSDU Manifest Ingestion API Service for ingesting the LQCIndicator Reference data into OSDU through Informatica pipeline.
- LQCIndicator has total 42 records and all of them successfully ingested into OSDU Acc Env, but for only 8 records the WellLogTypeID field is appearing as NULL, even though expected data is available in storage.
- Out of 42 LQCIndicator records 8 records are referencing to LOGTYPE:Interpreted and rest 34 records referencing to ConveyanceMethod:LoggingWhileDrilling and ConveyanceMethod:ElectricWirelineConveyed. All of the records referencing LogType:Interpreted are displaying "null" in the WellLogTypeID field. The other referenced fields (for ConveyanceMethod:LoggingWhileDrilling and ConveyanceMethod:ElectricWirelineConveyed) are displaying values as they should.
- When we queried the LQCIndicator schema kind through OSDU Search API service the WellLogTypeID field is showing NULL
- When we queried the LQCIndicator schema kind through OSDU Storage API Service the WellLogTypeID field is showing the correct data value - ("WellLogTypeID": "osdu:reference-data--LogType:Interpreted:").
- The 8 records (id) where we are facing WellLogTypeID field NULL issue are listed at the bottom.
- The WellLogTypeID & ConveyanceMethodID both are similar fields as per schema design. Both are referencing to OSDU Forum Schema’s. ConveyanceMethod referencing to schema – “osdu:wks:reference-data--ConveyanceMethod:1.0.0” and WellLogTypeID referencing to schema “osdu:wks:reference-data--LogType:1.0.0”.
- ConveyanceMethodID attribute works off the same referencing mechanism as WellLogTypeID, however this field is displaying values properly when obtained through the Search API - "ConveyanceMethodID": "osdu:reference-data--ConveyanceMethod:LoggingWhileDrilling:", as per the business requirement (The rest 34 LQCIndicator records).
- A new OSDU environment was created at Shell and we have successfully re-created the LQCIndicator Schema WellLogTypeID field NULL Issue in the new OSDU Environment.
- As suggested by DP team we are raising this issue to OSDU Forum and requesting you to help us on resolving this issue.
- I have attached the LQC Custom Reference Schema for your reference.
- I am happy to supply any further information as required
**Screenshot's of WellLogTypeID field NULL issue in LQCIndicator.1.0.0 Schema/Data type**
Queried through - OSDU Search API Service:
![image](/uploads/7a1863246448f997a3ba12dd8a318bdd/image.png)
Queried through - OSDU Storage API Service:
![image](/uploads/bfc4e4ab14fe38bef9962ec6be9262bc/image.png)
**Below is Search Query to find it in AWS@shell OSDU Acceptance Env - US Instance**
{
"kind": "shell:wks:reference-data--LQCIndicator:1.0.0",
"returnedFields": ["id", "data.WellLogTypeID"]
}
**Below are the 8 - LQCIndicator id’s out of 42 the WellLogTypeID data NULL issue** (for the rest 34 LQCIndicator – id’s the WellLogTypeID should be NULL, where ConveyanceMethodID attribute will be populated)
osdu:reference-data--LQCIndicator:24
osdu:reference-data--LQCIndicator:25
osdu:reference-data--LQCIndicator:26
osdu:reference-data--LQCIndicator:27
osdu:reference-data--LQCIndicator:28
osdu:reference-data--LQCIndicator:29
osdu:reference-data--LQCIndicator:30
osdu:reference-data--LQCIndicator:31
Looking forward for your guidance and help in resolving this issue. Thank youhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/128Enable FullText Highlight Feature of Elastic Search2023-10-26T18:29:05ZMark ChanceEnable FullText Highlight Feature of Elastic Search# ADR: Enable request highlighting
## Status
- [ ] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The Search service is built on top of the Elastic Search open source product. The searching features of that ...# ADR: Enable request highlighting
## Status
- [ ] Proposed
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The Search service is built on top of the Elastic Search open source product. The searching features of that product used by the OSDU Search Service. ~~This proposal provides for enabling additional such functionality to be made available.~~
## Tradeoff Analysis - Input to decision
This functionality enhances the usefulness of the search service to consuming applications ~~without requiring extensive development in the service itself~~.
## Decision
We propose to extend the search query JSON domain-specific language by adding an optional field to the Query API input: highlightedFields ~~and highlight~~.
This will enable the ElasticSearch highlighting functionality (as documented here: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/highlighting.html).
highlightedFields will list the fields in which search term hits are highlighted in the results.
```json
{
"kind": "osdu:*:dataset--File.Generic:*",
"query": "test*",
"offset": 0,
"limit": 30,
"trackTotalCount": true,
"highlightedFields": ["createUser", "id"]
}
```
When this input field is present, a new field is added to the Query response: highlight
```json
{
"results": [
{
"data": {
...
},
"kind": "osdu:wks:dataset--File.Generic:1.0.0",
"source": "wks",
...
"createUser": "serviceprincipal@testing.com",
"id": "osdu:dataset--File.Generic:autotest8751235",
"highlight": {
"createUser": [ "serviceprincipal@<em>test</em>ing.com" ],
"id": ["osdu:dataset--File.Generic:auto<em>test</em>8751235"]
}
},
]
}
```
In this case, the search term hits in the fields listed in "highlightedFields" and annotated with "em" tags for use in HTML-compatible display. ~~If the user puts "highlight" in the input payload, then whatever ElasticSearch returns is passed back in "highlight".~~
## Rationale
The field added enables a simple use case.
## Consequences
There are no impacts to existing applications. The complexity of the search query input is increased very slightly. The performance of existing queries will not be affected.
## When to revisit
## Alternatives and implications
## Decision criteria and tradeoffs
## Decision timelineM20 - Release 0.23Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/129Extend "sort on nested text fields" feature with possibility to add nested fi...2023-10-26T12:21:48ZMark ChanceExtend "sort on nested text fields" feature with possibility to add nested filter## Change existing nested sorting signature
From:
nested(path, field, mode)
To:
nested(path, field, mode, _nested_filter_) - (filter argument may be an optional argument to maintain backwards compatibility)
to allow user specify fil...## Change existing nested sorting signature
From:
nested(path, field, mode)
To:
nested(path, field, mode, _nested_filter_) - (filter argument may be an optional argument to maintain backwards compatibility)
to allow user specify filter that the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting (sounds complex, but this is an exact definition in elasticsearch docs)
## Example use case which is enabled by this feature
Below is the data model where we need to sort based on the AliasName where AliasNameTypeID= "osdu:reference-data--AliasNameType:UWI:"
```json
{
"data":
"NameAliases": [
{
"AliasName": "714100044935",
"TerminationDateTime": "2020-02-13T09:13:15.55+0000",
"AliasNameTypeID": "osdu:reference-data--AliasNameType:UWI:",
"EffectiveDateTime": "2020-02-13T09:13:15.55+0000",
"DefinitionOrganisationID": null
}
]
}
```
## Proposed syntax for new argument
The same syntax as 'query' top level endpoint argument. Alternatively, raw elasticsearch syntax for filter can be used.
Example request that will accomplish desired functionality after this implementation:
```json
{
"kind": "{{data-partition-id}}:*:*:*",
"sort": {
"field": [
"nested(data.NameAliases, AliasName, min, (AliasNameTypeID:\"osdu:reference-data--AliasNameType:UWI:\"))"
],
"order": [
"ASC"
]
}
}
```
## Scope of required work
Changes only in Search Service, passing new argument from API layer to SortParserUtil that has to be modified. https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/util/SortParserUtil.java#L118
This is originally requested as https://osdu-community.ideas.aha.io/ideas/IDEA-I-66M20 - Release 0.23Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/126Update integration with Policy service.2023-10-26T03:36:11ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comUpdate integration with Policy service.Currently, the Search service does not provide a valid payload in requests for Policy service. <br/>
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/po...Currently, the Search service does not provide a valid payload in requests for Policy service. <br/>
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/policy/service/PolicyServiceImpl.java#L61 <br/>
It uses global policies (instance policies), and Policy id is hardcoded, instead partition-based policies should be used. <br/>
Integration should be updated to utilize updated API: https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/merge_requests/340https://community.opengroup.org/osdu/platform/system/search-service/-/issues/131ADR: Exclude indices of the system/meta data from the search results unless t...2023-09-14T19:45:45ZZhibin MaiADR: Exclude indices of the system/meta data from the search results unless the indices (kinds) of the system/meta data are explicitly specified in the search queryIt is mostly like that the applications or systems may need to have its system/meta data searchable via OSDU search but those system/meta data are not expected to be included in the search results of normal keyword search,
for example, ...It is mostly like that the applications or systems may need to have its system/meta data searchable via OSDU search but those system/meta data are not expected to be included in the search results of normal keyword search,
for example, an application stores its system data in the storage under kind "xyz" _(please ignore the kind syntax in this example)_
- When users try to search data with keyword "wellbore", the data from kind "xyz" should not be included in the search result if users do search as below:
##### Case 1:
```
{
"kind": "*:*:*:*",
"query": "wellbore"
}
```
- When application (workflow) tries to search its system data with keyword "wellbore", the data from kind "xyz" should be included in the search result if the kind "xyz" is explicitly specified in the search query, e.g.
##### Case 2:
```
{
"kind": "xyz",
"query": "wellbore"
}
```
To achieve this objective and provide a general solution, we propose to use a reserved name in the "authority" or "source" field for kinds of the system/metadata.
- If those kinds are not explicitly specified in the search query as the **Case 1** above, the data from those kinds won't be included in the search result
- If those kinds are explicitly specified in the search query as the **Case 2** above, the data from those kinds will be included in the search result
The reserved name should be meaningful and odd (weird) enough to avoid naming conflict with the existing schema. It is an open question what it should be. Here a few proposals about the reserved name:
- "system" -- it may be too common
- "system-meta"
- "system-meta-data" -- should not be common if it is used in as "authority"
Whether the reserved name in "authority" or "source" is another open question. Here is what we think:
| Field | Pro | Con |
|:--------------|:------------------------------------------------------|:----------------------------------------------------------------|
| authority | it can be precisely filtered those indices | it could cause name conflict among tenants in multi-tenants env when they share the same services |
| source | it should not cause name conflict among tenants in multi-tenants env if each tenant has its own authority for its kinds | it could be impossible precisely filtered those indices. If the entity type field has the same keyword, those indices will be filtered out too |
Any input is welcomed before finalizing the solution.
Once we have a conclusion, Thomas will include this reserved keyword in the schema guide.M20 - Release 0.23Thomas Gehrmann [slb]Zhibin MaiThomas Gehrmann [slb]https://community.opengroup.org/osdu/platform/system/search-service/-/issues/1[Search] Searching Hierarchies with Nested Arrays of Objects2023-09-08T10:45:22ZGary Murphy[Search] Searching Hierarchies with Nested Arrays of ObjectsThe current Search service supports searching indexed documents with nested structures, but not nested arrays of objects. The ability to search such document structures is important for a number of data types, especially those with ind...The current Search service supports searching indexed documents with nested structures, but not nested arrays of objects. The ability to search such document structures is important for a number of data types, especially those with indeterminate numbers of members (e.g. events associated with an activity generator, various acquisition data types, and tags on entities with things like data quality tags, etc.
It should be possible to execute search queries against such documents once indexed and utilize values in the nested arrays of objects in the queries and responses.M1 - Release 0.1JoeJoehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/69ADR: Common discovery within and across kinds2023-07-13T09:46:54Zashley kelhamADR: Common discovery within and across kinds## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today a single schema can define multiple properties for geospatial data. For example Wellbore schema defines both the _GeographicBottomHoleLoca...## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today a single schema can define multiple properties for geospatial data. For example Wellbore schema defines both the _GeographicBottomHoleLocation_ and _ProjectedBottomHoleLocation_ properties.
The json key used for spatial data is also not consistent across schemas.
This causes issues for common consumption workflows like finding all entities that exist within a given area. This is because I don't know what property to query against for each type so to find all entities in a given area is complicated.
Looking beyond spatial data this is a common problem across different data types, for instance in a Wellbore schema the name is represented by the property 'FacilityName' however this key is not used for the name in other schemas.
We want to define a standard to allow indexing properties in a common way across types. This will provide
- A common property(s) to be searchable against across Kinds
- A priority list of schema properties that this can be populated from
- A way for these common properties to define relationships
## Trade-off Analysis
We could declare a single property to use on each schema to use as the common property. However there are schemas where multiple properties could be used and instances of entities where a specific property is not defined and another one is. Therefore no single property will ever be correct.
We could re-use the property key defined in the schema for indexing. However This causes consumers problems as they have to understand what property to use for each schema when discovering/running analytics across kinds. Defining a common property between schemas that can be used by consumers solves this concern.
We could define the standard directly in the schema only. This follows existing patterns with the indexing hints used [here](https://community.opengroup.org/search?search=x-osdu-indexing&group_id=218&project_id=91&scope=&search_code=true&snippets=false&repository_ref=master&nav_source=navbar). However this solution is inflexible to clients being able to provide their own mappings for OSDU schemas.
It does however allow for the standards to be maintained in the schema allowing control to be maintained by the schema authority. Therefore a solution that supports this whilst also providing flexibility to clients to provide their own mappings is preferable.
A separate ADR is proposed to allow for Schema extensions using the virtual property defined in this ADR.
## Decision
We are proposing a new optional attribute in schemas to define a common property mapping.
For OSDU schemas we propose to introduce a new property `x-osdu-virtual-properties`, with a dictionary of currently only one key `DefaultLocation`. This lists the path to the property and the order defines the priority. The first item in the list has highest priority. If that property does not exist or is not populated, the next get precedent.
`x-osdu-virtual-properties` can be used to map any properties to a new property name that can be used for consumption. Schemas can then declare the same virtual property to allow easier cross schema consumption.
The decision is backed by OSDU Data Definitions as per [Core Concepts meeting July 6, 2021](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2021/DataDefinitionsCoreConcepts_MeetingMinutes-2021-07-06.md#1-decisions).
The virtual property declared is never added to the record however is made use of by consumption services like indexer/search to create an indexed entry for it and so make the data discoverable based on this property.
#### Example use case: Assigning virtual properties within a schema
```json
{
"x-osdu-Virtual-properties":{
"data.VirtualProperties.DefaultLocation": {
"type": "object",
"priority": [
{ "path": "data.ProjectedBottomHoleLocation" },
{ "path": "data.GeographicBottomHoleLocation" },
{ "path": "data.SpatialLocation" }
]}
}
}
```
The above example is prepared for Wellbore, which comes with three potential shapes. The projected representation is preferred over the geographic coordinates. Last priority is the standard shape contributed by the `AbstractFacility`.
For now we should restrict it so every key created through this must be prefixed with the following
```data.VirtualProperties.```
The `DefaultLocation` key name does not clash with any existing entity type property. It becomes relevant in generic search queries across different types including spatial conditions, for example:
```json
{
"kind": "*:*:*:*",
"spatialFilter": {
"field": "data.VirtualProperties.DefaultLocation",
"byGeoPolygon": {
"points": [
{"longitude":-90.65, "latitude":28.56},
{"longitude":-90.65, "latitude":35.56},
{"longitude":-85.65, "latitude":35.56},
{"longitude":-85.65, "latitude":28.56},
{"longitude":-90.65, "latitude":28.56}
]
}
}
```
There's also an _optional_ `isType` key you can apply to the priorities object. This restricts the selection based on the type of data the property points to which can be different per Record instance.
For example datasets and artifacts referenced by a record are generic schemas and so is dependent on the record instance. In the below example the `data.dataset[].filepath` property is only mapped if it points to a GeoJson type ekse it then checks if it is a Raster file type. The `isType` value is not restricted.
```json
{
"x-osdu-virtual-properties":{
"data.VirtualProperties.MyLocation": {
"type": "object",
"priority": [
{
"path": "data.dataset[].filepath",
"isType": "GeoJson"
},
{
"path": "data.dataset[].filepath",
"isType": "Raster"
}
]}
}
}
```
The ```x-osdu-virtual-property``` section also supports an _optional_ ```x-osdu-relationship``` block to describe a relationship this virtual property may have. See the example below.
The OSDU Data Definitions team ensures that canonical, well-known schemas contain a populated `x-osdu-virtual-properties`.
The report will then look like:
|Kind|Default Priority|Comment|
|----|----|----|
|→ [osdu:wks:master-data--SeismicProcessingProject:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/SeismicProcessingProject.1.0.0.md) | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
|→ [osdu:wks:master-data--Well:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/Well.1.0.0.md) | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
|→ [osdu:wks:master-data--Wellbore:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/Wellbore.1.0.0.md) | 1: data.ProjectedBottomHoleLocation<br>2: data.GeographicBottomHoleLocation<br>3: data.SpatialLocation | Schema Controlled Order|
The first two kinds are reported as undefined, the third reports a proper order definition via the schema.
Keeping the x-osdu-virtual-properties mapping within the schema allows the data definitions team in OSDU to maintain control and order of how properties are mapped. However we still need to allow flexibility for specific client consumption workflows. This will be provided by Schema extensions.
#### Example use case: Describing relationships with virtual properties
It is also possible to tag virtual properties as relationships to achieve specific processing/indexing of relationships. The tagging is performed exactly the same as on standard OSDU schemas using the `x-osdu-` custom tags.
Here a simple relationship 'replication' example - the property `PetrelProjectID` refers to a record id of a record kind `slb:petrel:master-data--PetrelProject:*.*.*`. As a result, the property previously not visible to the indexer becomes declared and visible.
```
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.ExtensionProperties.PetrelProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
```
Unconstrained or open relationships to unspecified types are declared as `"x-osdu-relationship": []`.
The next example demonstrates a new relationship by means of a virtual property with prioritized sources:
```
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.VirtualProperties.ApplicationProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.TechlogExtensions.TechlogProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"EntityType": "TechlogProject"
}
]
},
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
```
It demonstrates the 'virtual merge' of a relationship for a given record. The `data.VirtualProperties.VirtualApplicationProjectID` is expected to carry a relationship to either a Petrel project (kind `*:*:master-data--PetrelProject:*`) or a `*:*:*TechlogProject:*`. Should the Wellbore record contain both property values as defined in the two `path` values, the first one, the `TechlogProjectID` is taken.
## Consequences
- All existing OSDU schemas should be updated that define spatial data with a new ```DefaultLocation``` virtual property
- Data Definitions team validates that all spatial entity types are properly tagged with `"x-osdu-virtual-properties"`.
- Indexer needs to support `"x-osdu-virtual-properties"`
- Indexer needs to re-index based on all schema creation/change notificationsM10 - Release 0.13https://community.opengroup.org/osdu/platform/system/search-service/-/issues/99Fix the search and indexing performance issues when the geometry of the docum...2023-07-10T16:17:26ZZhibin MaiFix the search and indexing performance issues when the geometry of the document is large##### Background:
Today the geometry or called shapes in the indexed records are not decimated. The size of geometry data could be large and reach tens of MB if hundreds of MB. As we know, the geometry in the search index can be used t...##### Background:
Today the geometry or called shapes in the indexed records are not decimated. The size of geometry data could be large and reach tens of MB if hundreds of MB. As we know, the geometry in the search index can be used to support spatial query, data preview or data discovery.
However, the large size of geometry in the indexed records could significantly affect the performance on retrieving the search results and prevent search results to be used efficiently in some utilities, such as GIS map. In O&G application, GIS map is a critical component that users may use to render the shapes in the given region as a tool for the data discovery. It may require to retrieve and render thousands or even millions of shapes from the OSDU index. If there are tens of thousand of shapes to be retrieved and rendered, the performance won't be good enough even the shapes are decimated. At another end, it is unnecessary to show the detail of the shapes when tens of thousands indexed records are returned from the search.
##### Proposal:
We propose decimate the geometry of the following GeoJSON geometry types by implementing Ramer–Douglas–Peucker algorithm for the original shape attribute and shape attribute "data.VirtualProperties.DefaultLocation.Wgs84Coordinates" if exists.
- LineString
- MultiLineString
- Polygon
- MultiPolygon
Regarding shape attribute "data.VirtualProperties.DefaultLocation", please refer to ADR [Common discovery within and across kinds](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/69)
##### Performance Evaluation:
We did some performance evaluation with the prototype to decimate the original shape attribute and shape attribute "data.VirtualProperties.DefaultLocation.Wgs84Coordinates" using some seismic 2D surveys. The tolerance or epsilon is about 10 meters which is about 0.0001 degree around the equator.
The information of the test dataset and summary of the test report are attached below:
- [performance_test_summary.txt](/uploads/dc913a11d5cead3a1b5b54529c5449de/performance_test_summary.txt)
- [test_dataset.csv](/uploads/0263b8e976526c246e4dd8074a8c52f2/test_dataset.csv)
##### Summary:
1. The decimation of the shape attributes significantly improve the end to end search performance (search and data retrieval from elastic search to the test client)
2. The extra overhead of the decimation during indexing is offset by the gain of saving time on elastic search indexing of the geo-shapes. The test result indicates that it reduced the overall indexing time by 58%.M14 - Release 0.17Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/65nested search expects space between , and (2023-06-23T08:20:40ZAn Ngonested search expects space between , and (consider these two queries
"nested(data.VerticalMeasurements, (VerticalMeasurement:(>15)))"
"nested(data.VerticalMeasurements,(VerticalMeasurement:(>15)))"
First one works, 2nd one does not workconsider these two queries
"nested(data.VerticalMeasurements, (VerticalMeasurement:(>15)))"
"nested(data.VerticalMeasurements,(VerticalMeasurement:(>15)))"
First one works, 2nd one does not workM18 - Release 0.21Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/73ADR: Search data across multiple kinds in one search2023-06-23T08:08:59ZZhibin MaiADR: Search data across multiple kinds in one search## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
It is quite common for users or applications to search data across multiple kinds in one search. In OSDU search, each kind is mapped to one inde...## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
It is quite common for users or applications to search data across multiple kinds in one search. In OSDU search, each kind is mapped to one index. That means that users may need to search data across multi-indices in Elasticsearch. Elasticsearch supports search across multi-indices by specifying either index names as wildcard or a list of index names.
Currently, OSDU search only expose the wildcard solution (e.g. "kind": "\*:\*:\*:\*") to support search across multi-indices.
There may be hundreds of kinds if not thousands in one tenant data partition. We found that using wildcard to search across multi-indices introduces significant overhead on performance as comparing with a list of index names. The more indices in Elasticsearch, the bigger overhead could be introduced. The attached diagram shows our observation:
![image](/uploads/e84ac4851dd5d19c280b75e4b602d3ad/image.png)
## Trade-off Analysis
Here is the relevant API spec: https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/api/SearchApi.java
Without introducing new field in the search API, we propose to concatenate the index (kind) names with comma in the existing “kind” property, e.g.
``````````````````````````
e.g. I have kinds in the system
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
I want to search keyword "well" against only 2 kinds
a:b:c:d
a:e:c:d
today I can only do this by forming a query
{
“kind”: "a:*:c:d”,
“query”: “(\"kind\": \"a:b:c:d\" OR \"kind\": \"a:e:c:d\") AND well”
}
This still makes my query slower because the search is performed against all indexes the wildcard matches i.e.
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
even though I know I only want to search against 2 of the indexes. The proposed solution will allow me to change this to
{
“kind”: “a:b:c:d,a:e:c:d”,
"query": "well"
}
Making my query easier to write and potentially a lot more performant as it targets ony the indexes I want to search against
``````````````````````````
Here are the Pros and Cons of the proposal:
| Pros| Cons|
| ------ | ------ |
| - Non-breaking change. No API change required. | - Not following the json pattern to code multiple items |
| - It is consistent with Elasticsearch's pattern on coding multi-indices for search. | |
| - Change only on "Common Code" in both "OSDU Core Common" and "Search Service". | |
## Decision
The proposal is a non-breaking change. Its implementation is pretty simple and safe. Prototype of the implementation in OSDU Core Common and Search Service can be found in MRs:
- [Change on Core Common](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/merge_requests/127)
- [Change on Search service](https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/190)
## Consequences
This is a non-breaking change but with big performance gain when searching across multiple indices.M11 - Release 0.14Zhibin MaiZhibin Mai2022-01-14https://community.opengroup.org/osdu/platform/system/search-service/-/issues/38ADR: Nested query search2023-06-07T10:34:15ZMichael Tarasov (EPAM)ADR: Nested query search## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
With the recent changes of incorporating search hints into the data definition schema (https://community.opengroup.org/osdu/platform/syst...## Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context
With the recent changes of incorporating search hints into the data definition schema (https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/16), some arrays might be marked as nested objects (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html). The nested type is a specialized version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.
With the implementation of https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/16, Elasticsearch has injected nested objects and can be queried directly (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#nested-query-ex-query).
But Search service is currently unable to perceive or interpret such requests.
It is necessary to modify the Search service to provide the function of initiation in queries of conditions based on the data of the arrays, indexed using the "nested" hint. At the same time, one should take into account the possibility of composing complex queries that combine several conditions, some of which refer to the array, and some to the main structure of the indexed document.
## Scope
Extend Search service to support searching through arrays indexed with the "nested" hint:
- Extend the API query format for including search conditions for such arrays in the API
- Implement the interpretation of such conditions when translating a request into the final Elasticsearch format
## Decision
To understand what the final format will require a complex search query for Elasticsearch, here is an example of field mapping for an index created on the basis of some simple fictional Scheme:
```
PUT /tools {
"mappings": {
"properties": {
"tool": {"type": "text"},
"properties": {"type": "nested"}
}
}
}
```
Here you can see one text field "tool" and one array "properties". Let's index two documents:
```
POST / _bulk
{"index": {"_index": "tools", "_id": "1"}}
{"tool": "hammer", "properties": [{"brand": "ABC", "country": "USA"}, {"weight": 1000}]}
{"index": {"_index": "tools", "_id": "2"}}
{"tool": "screwdriver", "properties": [{"brand": "XYZ", "country": "USSR"}, {"weight": 500}]}
```
The simplest query on the fields of these documents, which we could make through the Search API, would look like this:
```
POST {{SEARCH_HOST}}/query
{
"kind": "osdu:toolmarket:tools:1.0.0",
"query": "data.Tool:\"hammer\""
}
```
And as a result, during the transformation in the Search code, the following request would be sent to Elasticsearch:
```
GET /tools/_search {
{
"query": {
"bool": {
"must": [
{"match": {"tool": { "query": "hammer" }}}
]
}
}
}
```
But to search through an array indexed by the "nested" hint, Elasticsearch needs a special syntax. This is how the composite query for the "hammer of the US-registered ABC brand" looks like:
```
GET /tools/_search {
{
"query": {
"bool": {
"must": [
{"match": {"tool": { "query": "hammer" }}},
{"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{"match": {"properties.brand": {"query": "ABC"}}} ,
{"match": {"properties.country": {"query": "USA"}}}
]
}
}
}
}
]
}
}
}
```
We see the inclusion of the "nested" node, but also an additional "path" hint, which says in which of the "nested" arrays this subquery should be executed.
And this kind of conditions in the API Search service is not currently supported.
It should be added with minimal complication when the end user compiles a request.
The following format is suggested:
```
POST {{SEARCH_HOST}}/query
{
"kind": "osdu:toolmarket:tools:1.0.0",
"query": "(data.Tool:\"hammer\") AND nested(data.Properties, (brand:\"ABC\" AND country: \"USA\"))"
}
```
As you can see, this format introduces a function ```nested(path, query)```.
The first argument specifies the path to the "nested" search array, and the second specifies the request body, where field names are truncated by removing the path specified in the first argument. This construction is easy to understand and easy to parse inside the Search service code.
Now let's complicate the query by adding a “range” condition to search by tool weight. The "weight" property is defined in a separate property (a separate array item object), then we need two subqueries of the "nested" type:
```
POST {{SEARCH_HOST}}/query
{
"kind": "osdu:toolmarket:tools:1.0.0",
"query": "(data.Tool:\"hammer\")
AND nested(data.Properties, (brand:\"ABC\" AND country: \"USA\"))
AND nested(data.Properties, (weight:\">500\"))"
}
```
The resulting query for Elasticsearch will be like this:
```
{
"query": {
"bool": {
"must": [
{"match": {"tool": { "query": "hammer" }}},
{"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{"match": {"properties.brand": {"query": "ABC"}}},
{"match": {"properties.country": {"query": "USA"}}}
]
}
}
}
},
{"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{"range": {"properties.weight": {"gte": "500"}}}
]
}
}
}
}
]
}
}
}
```
### Outcome of the decision:
#### - format for QUERY section:
##### one level nesting:
```nested(path, query)```
##### multi level nesting:
`...nested(path1, (...nested(path12, (...nested(path123, (...)...)...)...)...)...)`
##### example:
```
"query": "(data.Tool:\"hammer\")
AND nested(data.Properties, (brand:\"ABC\" AND country: \"USA\"))
AND nested(data.Properties, (weight:\">500\"))
```
#### - format for SORT section:
##### format:
`nested(path, field, mode)`
##### example:
```
"sort": {
"field": ["nested(data.Properties, brand, min)", "nested(data.Properties, country, min)"],
"order": ["ASC", "ASC"]
}
```
#### - format for AGGREGATION section:
##### format:
`nested(path, field)`
##### example:
```
"aggregateBy": "nested(data.Properties, brand)",
```
## Rationale
Nested object query is a valuable type of search. Queries may be very sophisticated and include multiple AND/OR conditions addressed to different pieces of indexed document data structure, including multiple mentions of the same or different "nested" arrays objects. Only arrays, indexed as "nested", allow really accurate search by set of properties of each array item object.
The proposed API query format allows to descript all these complicated composite conditions.
## Consequences
- Educate the DD community on the pros and cons of using nested type vs flattened.https://community.opengroup.org/osdu/platform/system/search-service/-/issues/6Fields shouldn't be represented as hierarchical strings within a Search platform2023-06-02T16:26:04ZAlex CloseFields shouldn't be represented as hierarchical strings within a Search platformSome fields within the Search Mappings are not search friendly and are counter intuitive for search from both a performance and usability/UX perspective.
Two examples come to mind: SRN & Kinds
**Example: SRN ("data.Data.IndividualType...Some fields within the Search Mappings are not search friendly and are counter intuitive for search from both a performance and usability/UX perspective.
Two examples come to mind: SRN & Kinds
**Example: SRN ("data.Data.IndividualTypeProperties.TrajectoryTypeID": "srn:reference-data/WellboreTrajectoryType:Deviated:")**
There are plenty of strong, valid arguments for having the SRN data format when describing a data schema such as the "Well Known Schema" in OSDU. However, when it comes to Search it adds a lot of confusion, complexity and redundancy. It feels as if the current process is to take the "Well Known Schema" and give that to the Search Service. It seems as if there is a missing layer that transforms the "Well Known Schema" into a "Search Schema".
Whenever considering the naming and structure of fields and value for Search, one should always ask "What value does this give the user?". To take that further, we can look at the example field name of "data.Data.IndividualTypeProperties.TrajectoryTypeID". Here we can see that 'data' is duplicated twice, one with the first letter uppercase and another with lowercase. We can then see a generic field name of "IndividualTypeProperties". Having `data` duplicated twice would be confusing for the user, especially with the change in capitalisation. Does "IndividualTypeProperties" add value to the user? It doesn't provide any context to what child fields exist. Therefore, it adds a barrier to entry. An alternative way to represent this is shown below. Having the intermediate "IndividualTypeProperties" could be important, if so I would recommend a name change to make it more relevant.
```
"data.TrajectoryType": "Deviated"
```
or following a different example `data."Data.IndividualTypeProperties.DataSourceOrganisationID" : "srn:master-data/Organisation:TNO:"` becomes:
```
"data.OrganisationID": "TNO"
```
The limitations of the long string srn format can be broken down into:
* Requires extensive wildcard searches which are computationally complex
* Relying on the incorrect field type for the kind of search that is being promoted - this should use Keyword, not Text
* It is complicated for end users to understand what the child values of generic field names are
* It is complicated for end users to understand the expected value for generic field names
* Can't rely on auto complete or fieldname discovery
* Creates a barrier of adoption for the developers building apps onto of OSDU as there is missing context due to generic field naming
Migrating to a more specific field name structure as suggested provides the following
* Simple and intuitive for both user and application driven search
* Can easily leverage autodiscovery & autocomplete of fields and values
* Can leverage both text and keyword data types for fields to get any desired behaviour
**Example: Kind ("kind" : "opendes:osdu:wellbore-master:0.2.1")**
As with the above SRN example, the searchability of Kinds today is quite limited. The kind field is a single keyword mapping that contains a host of information. Being a kind field, it is only available for exact term matching.
In order to enable a stronger search experience, I would recommend breaking `kind" : "opendes:osdu:wellbore-master:0.2.1"` up to look similar to:
```
"kind": {
"raw": "opendes:osdu:wellbore-master:0.2.1",
"level1": "opendes",
"level2": "osdu",
"level3": "wellbore-master",
"version": {
"raw": "0.2.1",
"major": 0,
"minor": 2,
"patch": 1
}
}
```
By breaking this out, folks can do interact in a more intuitive way and have more flexibility and control. This concept isn't applicable to just kinds, however they are a good candidate.https://community.opengroup.org/osdu/platform/system/search-service/-/issues/116Search Query Response does not adhere to given filter string2023-05-25T09:08:09ZDebasis ChatterjeeSearch Query Response does not adhere to given filter stringSource - @nisha.thakran
While testing Search API we have observed that the response fetched is not giving the appropriate result as per the query filter.
For example, below query should return the records having data.Source exactly a...Source - @nisha.thakran
While testing Search API we have observed that the response fetched is not giving the appropriate result as per the query filter.
For example, below query should return the records having data.Source exactly as “test” as per the given link(docs/tutorial/SearchService.md · master · Open Subsurface Data Universe Software / Platform / System / Search · GitLab (opengroup.org)):
Query :
```
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"query": "data.Source:\"test\""
}
```
But as per the attached log file it is returning all the records that has “test” as data.Source as well as records that contains “test” in data.Source.
[search_api_response.txt](/uploads/f0beb2d19285ba53c23e04de003f318e/search_api_response.txt)
Also, we have noticed the same behaviour on AWS-Preship-M15,AWS-Preship-M14 and Azure Preship.
cc @chad , @AshishSaxenaAccentureM18 - Release 0.21https://community.opengroup.org/osdu/platform/system/search-service/-/issues/122Enhance documentation for multi-kind search feature2023-05-25T09:06:07ZAn NgoEnhance documentation for multi-kind search featureIn multi-kind search, note the behavior when the index of any kind does not exist.
Previously, we returned a 200 with 0 result. New behavior is to return the results for the valid indices.
(Do we list the kinds that failed?)
Also docum...In multi-kind search, note the behavior when the index of any kind does not exist.
Previously, we returned a 200 with 0 result. New behavior is to return the results for the valid indices.
(Do we list the kinds that failed?)
Also document on length limit, best practice, workaround, etc.M18 - Release 0.21https://community.opengroup.org/osdu/platform/system/search-service/-/issues/117Improve documentation about search with exact string2023-05-25T09:05:30ZDebasis ChatterjeeImprove documentation about search with exact stringSee recent issue #116 from @nisha.thakran .
Response from Thomas Griener in Slack channel #1_1_2_osdu-search
I have run a few tests myself on this issue, and by using ":" as the query operator, for example
```
{
"kind": "osdu:wks:m...See recent issue #116 from @nisha.thakran .
Response from Thomas Griener in Slack channel #1_1_2_osdu-search
I have run a few tests myself on this issue, and by using ":" as the query operator, for example
```
{
"kind": "osdu:wks:master-data--Well:1.1.0",
"query": "data.Source: \"test\"",
}
```
I get the exact returned fields back, in my cases. However, if I change the query operator to "=", i.e.,
```
{
"kind": "osdu:wks:master-data--Well:1.1.0",
"query": "data.Source= \"test\"",
}
```
I do not get the exact back.
Note to @chad - I think we need to plan for suitable update of documentation here.
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.mdM18 - Release 0.21https://community.opengroup.org/osdu/platform/system/search-service/-/issues/115Enhance Search Documentation2023-05-25T09:05:15ZMichaelEnhance Search DocumentationLooking at the search source code, I noticed there was an option for SpatialFilter byIntersection search query, however, I do not see if referenced anywhere in the search documentation.
This search feature appears to be present in M15 f...Looking at the search source code, I noticed there was an option for SpatialFilter byIntersection search query, however, I do not see if referenced anywhere in the search documentation.
This search feature appears to be present in M15 for all CSPs.
Should this feature be documented or is the a reason this feature is not documented? What milestone version was this feature made available?M18 - Release 0.21https://community.opengroup.org/osdu/platform/system/search-service/-/issues/125Improve docs on spatial search2023-05-25T09:04:26ZAdam ChengImprove docs on spatial searchSpatial search on OSDU using byPolygon or byBoundingbox is evaluated as "inside" which is different from ElasticSearch's "intersects". Suggest highlighting this in documentation and provide a "byIntersection" exampleSpatial search on OSDU using byPolygon or byBoundingbox is evaluated as "inside" which is different from ElasticSearch's "intersects". Suggest highlighting this in documentation and provide a "byIntersection" example