Search issueshttps://community.opengroup.org/osdu/platform/system/search-service/-/issues2024-03-12T18:48:40Zhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/160Local running of unit tests is unsuccessful.2024-03-12T18:48:40ZRiabokon Stanislav(EPAM)[GCP]Local running of unit tests is unsuccessful.When attempting to execute JUnit tests in the local environment for the core part of the search service, we observe 11 unsuccessful tests:
```
Results :
Tests in error:
testQueryIndex_whenNoCursorInSearchQueryAndSearchHitsIsEmpty(or...When attempting to execute JUnit tests in the local environment for the core part of the search service, we observe 11 unsuccessful tests:
```
Results :
Tests in error:
testQueryIndex_whenNoCursorInSearchQueryAndSearchHitsIsEmpty(org.opengroup.osdu.search.provider.impl.ScrollCoreQueryServiceImplTest): Error processing search request
testQueryIndex_whenSearchGives500_thenThrowException(org.opengroup.osdu.search.provider.impl.ScrollCoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<org.junit.ComparisonFailure>
testQueryBase_whenClientSearchResultsInElasticsearchStatusException_statusServiceUnavailable_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_whenClientSearchResultsInElasticsearchStatusException_statusNotFound_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_whenClientSearchResultsInElasticsearchStatusException_statusBadRequest_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_whenClientSearchResultsInElasticsearchStatusException_statusTooManyRequests_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_SocketTimeoutException_ListenerTimeout_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_whenUnsupportedSortRequested_statusBadRequest_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_IOException_ListenerTimeout_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
testQueryBase_IOException_RespopnseTooLong_throwsException(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Unexpected exception, expected<org.opengroup.osdu.core.common.model.http.AppException> but was<java.lang.AssertionError>
should_return_CorrectQueryResponseforIntersectionSpatialFilter(org.opengroup.osdu.search.provider.impl.CoreQueryServiceImplTest): Error processing search request
Tests run: 200, Failures: 0, Errors: 11, Skipped: 10
```
It seems likely that these issues share a common underlying cause:
`if (!autocompleteFeatureFlag.isFeatureEnabled(AUTOCOMPLETE_FEATURE_NAME) || suggestPhrase == null || suggestPhrase == "") {
return null;
}`
For example, 'should_return_CorrectQueryResponseforIntersectionSpatialFilter'
```
@Test(expected = AppException.class)
public void testQueryBase_IOException_RespopnseTooLong_throwsException() throws IOException {
IOException exception = mock(IOException.class);
doReturn(new ContentTooLongException(null)).when(exception).getCause();
doReturn("dummyMessage").when(exception).getMessage();
doThrow(exception).when(client).search(any(), any(RequestOptions.class));
try {
sut.queryIndex(searchRequest);
} catch (AppException e) {
int errorCode = 413;
String errorMessage = "Elasticsearch response is too long, max is 100Mb";
validateAppException(e, errorCode, errorMessage);
throw (e);
}
}
```
It appears that there is a missing property or stub for the 'AUTOCOMPLETE_FEATURE_NAME' feature.
Besides, I could not find tests in https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/624/M23 - Release 0.26https://community.opengroup.org/osdu/platform/system/search-service/-/issues/158Feature 'featureFlag.autocomplete.enabled' does not work.2024-03-08T10:40:15ZRiabokon Stanislav(EPAM)[GCP]Feature 'featureFlag.autocomplete.enabled' does not work.The GC team initiated testing for a new feature 'featureFlag.autocomplete.enabled.' Following the documentation guidelines, we configured the 'featureFlag.bagOfWords.enabled' flag with a value of 'true' on the Indexer Service and set 'fe...The GC team initiated testing for a new feature 'featureFlag.autocomplete.enabled.' Following the documentation guidelines, we configured the 'featureFlag.bagOfWords.enabled' flag with a value of 'true' on the Indexer Service and set 'featureFlag.autocomplete.enabled' to 'true' as well. Unfortunately, the integration test did not yield the expected results.
To investigate the issue further, we carefully examined the index from the Elasticsearch.
```
{
"osdu-search1709032988256-test-data--integration-1.0.1": {
"aliases": {
"a1632179934": {
},
"a1632185707": {
}
},
"mappings": {
"dynamic": "false",
"properties": {
"acl": {
"properties": {
"owners": {
"type": "keyword"
},
"viewers": {
"type": "keyword"
}
}
},
"ancestry": {
"properties": {
"parents": {
"type": "keyword"
}
}
},
"authority": {
"type": "constant_keyword",
"value": "osdu"
},
"bagOfWords": {
"type": "text",
"store": true,
"fields": {
"autocomplete": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
},
"createTime": {
"type": "date"
},
"createUser": {
"type": "keyword"
},
"data": {
"properties": {
"Basin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Center": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"County": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"DblArray": {
"type": "double"
},
"EmptyAttribute": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Established": {
"type": "date"
},
"Field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Location": {
"type": "geo_point"
},
"OriginalOperator": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"Rank": {
"type": "integer"
},
"Score": {
"type": "integer"
},
"State": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellStatus": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
},
"WellType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256,
"normalizer": "lowercase"
}
},
"copy_to": [
"bagOfWords"
]
}
}
},
"id": {
"type": "keyword"
},
"index": {
"properties": {
"lastUpdateTime": {
"type": "date"
},
"statusCode": {
"type": "integer"
},
"trace": {
"type": "text"
}
}
},
"kind": {
"type": "keyword"
},
"legal": {
"properties": {
"legaltags": {
"type": "keyword"
},
"otherRelevantDataCountries": {
"type": "keyword"
},
"status": {
"type": "keyword"
}
}
},
"modifyTime": {
"type": "date"
},
"modifyUser": {
"type": "keyword"
},
"namespace": {
"type": "keyword"
},
"source": {
"type": "constant_keyword",
"value": "search1709032988256"
},
"tags": {
"type": "flattened"
},
"type": {
"type": "keyword"
},
"version": {
"type": "long"
},
"x-acl": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"refresh_interval": "30s",
"number_of_shards": "1",
"provided_name": "osdu-search1709032988256-test-data--integration-1.0.1",
"creation_date": "1709032992807",
"number_of_replicas": "1",
"uuid": "rpIKCM9NRmm3gb41_7algw",
"version": {
"created": "7171799"
}
}
}
}
}
```
As far as we can determine, the Indexer has introduced a new block:
```
"bagOfWords": {
"type": "text",
"store": true,
"fields": {
"autocomplete": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
```
Acknowledged that the implementation is in accordance with the documentation.
The new request to the Search Service has been reviewed.
```
{
"offset":0,
"kind":"osdu:search1709032988256:test-data--Integration:1.0.1",
"limit":0,
"query":"data.OriginalOperator:OFFICE4",
"suggestPhrase":"data",
"returnHighlightedFields":false,
"highlightedFields":[
],
"returnedFields":[
],
"queryAsOwner":false,
"trackTotalCount":false
}
```
A request from the Search Service to the Elasticsearch:
`SearchRequest{searchType=QUERY_THEN_FETCH, indices=[osdu-search1709032988256-test-data--integration-1.0.1,-.*,-system-meta-data-*], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":10,"timeout":"1m","query":{"bool":{"must":[{"bool":{"must":[{"query_string":{"query":"data.OriginalOperator:OFFICE4","fields":[],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"allow_leading_wildcard":false,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}],"filter":[{"terms":{"x-acl":["data.test-users-data-root`
truncated...
Notably, the recent request from the Search Service to Elasticsearch lacks a field identified as 'autocomplete.'
Additionally, we have identified a method, org.opengroup.osdu.search.util.SuggestionsQueryUtil#getSuggestions, which seems to contain the logic related to suggestions.
```
public SuggestBuilder getSuggestions(String suggestPhrase) {
if (!autocompleteFeatureFlag.isFeatureEnabled(AUTOCOMPLETE_FEATURE_NAME) || suggestPhrase == null || suggestPhrase == "") {
return null;
}
SuggestionBuilder suggestionBuilder = SuggestBuilders.completionSuggestion(
"bagOfWords.autocomplete"
).text(suggestPhrase).skipDuplicates(true);
SuggestBuilder suggestBuilder = new SuggestBuilder();
suggestBuilder.addSuggestion(SUGGESTION_NAME, suggestionBuilder);
return suggestBuilder;
}
```
I suppose this method can be used when we create a request to Elastic Search, but this method will be run ONLY Junit tests.
To sum up, this feature 'featureFlag.autocomplete', perhaps, has not been implemented. Please, play an attention for it.M23 - Release 0.26Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/157ADR: Pagination Query API2024-02-27T20:48:46ZNeelesh ThakurADR: Pagination Query API<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
Paginating over large query result is a common discovery workflow. Search service query API can return ...<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
Paginating over large query result is a common discovery workflow. Search service query API can return maximum 10K records, anything higher then this requires usage of Search service's `query_with_cursor` API (`POST /api/search/v2/query_with_cursor`). As OSDU Data Platform adoption has increased over milestone releases, users have repeatedly complained (Issues: [130](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/130), [156](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/156) etc.) on Search service's `query_with_cursor` API reliability & performance. Some of the most common issues reported:
- During deep pagination over large result-set, API may throw error in the middle & users have to start over. It can be very time consuming and costly exercise.
- By default, each data-partition can have maximum `500` active cursors, if this limit is reached then API throws an exception. Users have repeatedly complained that even with light usage, this quota gets exhausted and they cannot make new cursor API call.
- Cursor count per Search service request calculation is opaque. One Search service cursor request can potentially consume lot of cursors on the Search backend (Elasticsearch). It's very hard to provide users any guidance, how many concurrent cursor requests can be made on a data-partition.
- Cursor quota is a soft limit and can be potentially increased to mitigate issue. Quota increase will have impact on Search backend resource usage which can then degrade Search and Indexing latencies. Any resolution to latency requires Search backend resource scaling, thus increasing infrastructure and licensing cost.
# Context & Scope
As we have looked over solutions to issues reported in earlier section, and found there are only two choices:
- We cannot reliably scroll over large result set so drop the support of scrolling over records higher then 10K.
- Provide a new Search service API that utilizes [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API from Search backend (Elasticsearch).
We cannot limit maximum record that can be fetched from Search service as it may break existing consumer workflows. Search service must provide provide a reliable and performant API that will allow scrolling over all records in response, irrespective of their count.
[search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API does not suffer from the reliability issues that users have reported and recommended by Elasticsearch to be used in place of cursor/scroll API. Search service should add new API that makes use of [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API from Elasticsearch.
[Back to TOC](#TOC)
# Proposed solution
Search service should two new endpoints to support pagination:
- New endpoint to paginate via [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API from Elasticsearch.
- New endpoint to free up pagination resources if next page is not needed.
<details>
<summary>API specification</summary>
```yaml
openapi: 3.0.0
info:
description: Search service
version: 2.0.0
title: Search Service APIs
tags:
- name: Search
description: Service endpoints to search data in OSDU Data Platform
security:
- bearer: []
paths:
/pagination-query:
post:
tags:
- Search
summary: Queries using the input request criteria.
description: "The API supports full text search on string fields, range queries on date, numeric or string fields, along with geo-spatial search. Required
roles: 'users.datalake.viewers' or 'users.datalake.editors' or 'users.datalake.admins'. In addition, users must be a member of data
groups to access the data. It can be used to retrieve large numbers of results (or even all results) from a single search request, in much the
same way as you would use a cursor on a traditional database. API will respond with `nextCursor` if results are higher then maximum page size (1K). To request
next page, another request with same API that includes `nextCursor` value from last response must be supplied. All other fields on next pagination-query
request must be same and should be received by the service before cursor expires (defaults to 60s).
operationId: Pagination query
parameters:
- $ref: "#/components/parameters/data-partition-id"
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/PaginationQueryRequest"
responses:
"200":
description: Success
content:
application/json:
schema:
$ref: "#/components/schemas/PaginationQueryResponse"
"400":
description: Invalid parameters were given on request
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"401":
description: Unauthorized
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"403":
description: User not authorized to perform the action
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"502":
description: Search service scale-up is taking longer than expected. Wait 10
seconds and retry.
content:
application/json:
schema:
type: string
security:
- bearer: []
/pagination-query-cursor:
delete:
tags:
- Search
summary: Pagination resources should be freed up if not used anymore. Deletes pagination query cursor and frees up resources.
description: "Required roles: 'users.datalake.viewers' or 'users.datalake.editors' or 'users.datalake.admins'."
operationId: Delete pagination query cursor
parameters:
- $ref: "#/components/parameters/data-partition-id"
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/PaginationQueryCursorDeleteRequest"
responses:
"200":
description: Success
"400":
description: Invalid parameters were given on request
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"401":
description: Unauthorized
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"403":
description: User not authorized to perform the action
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"404":
description: Pagination query cursor not found
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"502":
description: Search service scale-up is taking longer than expected. Wait 10
seconds and retry.
content:
application/json:
schema:
type: string
security:
- bearer: []
components:
parameters:
data-partition-id:
name: data-partition-id
in: header
description: desired data partition id
required: true
schema:
type: string
securitySchemes:
bearer:
type: apiKey
name: Authorization
in: header
schemas:
PaginationQueryRequest:
type: object
required:
- kind
properties:
kind:
type: object
example: The kind of the record to query e.g. "tenant1:test:well:1.0.0" or ["tenant1:test:well:1.0.0", "tenant1:test:well:2.0.0"].
description: "'kind' to search"
query:
type: string
description: The query string in Lucene query string syntax.
returnedFields:
type: array
description: The fields on which to project the results.
items:
type: string
sort:
$ref: "#/components/schemas/SortQuery"
queryAsOwner:
type: boolean
example: false
description: The queryAsOwner switches between viewer and owner to return results
that you are entitled to view or results you are the owner of.
spatialFilter:
$ref: "#/components/schemas/SpatialFilter"
cursor:
type: string
description: Search context to retrieve next batch of results. It must be empty for the first request and subsequent requests must provide valid 'cursor'.
trackTotalCount:
type: boolean
description: Tracks accurate record count matching the query if 'true', partial count otherwise. Partial count queries are more performant. Default is 'false' and returns 10000 if matching records are higher than 10000.
example:
kind: osdu:welldb:wellbore:1.0.0
limit: 30
query: data.Basin:"Ft. Worth"
returnedFields:
- data.kind
queryAsOwner: false
cursor: <put a valid cursor or leave it blank for the first request>
PaginationQueryResponse:
type: object
properties:
nextCursor:
type: string
description: Search context to retrieve next batch of results. It's valid for 60s. Next pagination request must be recieved before it expires.
results:
type: array
items:
type: object
additionalProperties:
type: object
totalCount:
type: integer
format: int64
description: Returns accurate count if 'trackTotalCount' is 'true', partial count otherwise. Returns 10000 if matching records are higher than 10000 if partial count is requested.
PaginationQueryCursorDeleteRequest:
type: object
properties:
cursor:
type: string
description: Valid cursor for clean-up. Request must be received before cursor expiration.
ByBoundingBox:
type: object
required:
- bottomRight
- topLeft
properties:
topLeft:
$ref: "#/components/schemas/Point"
bottomRight:
$ref: "#/components/schemas/Point"
ByDistance:
type: object
required:
- point
properties:
distance:
type: number
format: double
example: 1500
description: The radius of the circle centered on the specified location. Points
which fall into this circle are considered to be matches.
minimum: 0
maximum: 9223372036854776000
point:
$ref: "#/components/schemas/Point"
ByGeoPolygon:
type: object
properties:
points:
type: array
description: Polygon defined by a set of points.
items:
$ref: "#/components/schemas/Point"
Point:
type: object
properties:
latitude:
type: number
format: double
example: 37.450727
description: Latitude of point.
minimum: -90
maximum: 90
longitude:
type: number
format: double
example: -122.174762
description: Longitude of point.
minimum: -180
maximum: 180
SortQuery:
type: object
properties:
field:
type: array
description: The list of fields to sort the results.
items:
type: string
order:
type: array
description: The list of orders to sort the results. The element must be either
ASC or DESC.
items:
type: string
SpatialFilter:
type: object
properties:
field:
type: string
description: geo-point field in the index on which filtering will be performed.
Use GET schema API to find which fields supports spatial search.
byBoundingBox:
$ref: "#/components/schemas/ByBoundingBox"
byDistance:
$ref: "#/components/schemas/ByDistance"
byGeoPolygon:
$ref: "#/components/schemas/ByGeoPolygon"
AppError:
type: object
properties:
code:
type: integer
format: int32
reason:
type: string
message:
type: string
```
</details>
### Implementation details on Pagination Query API
First [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API usage requires a [PIT](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/point-in-time-api.html) id to be created ahead of time and supplied on the [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API call to Elasticsearch cluster. Pagination Query API should wrap both of these API calls in first pagination request.
If there are more than one page then [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API call will respond with PIT id of next page and sort values along with results. PIT id and sort values are required to fetch next page. Pagination Query API response's `nextCursor` attribute should be set to value that's a combination of both. PIT id is pretty long, it can be shortened & cached using [existing hashing function](https://community.opengroup.org/osdu/platform/system/search-service/-/blame/7b522a79df7b4c23fabe61e5026671c31fae876a/provider/search-azure/src/main/java/org/opengroup/osdu/search/provider/azure/provider/impl/ScrollQueryServiceImpl.java#L190) before returning response to end user. `nextCursor` attribute can then be set to: shortened(PID id) + base64.encode(sort value).
When Search receives next page request then pagination-query API will breakdown PID id and sort values by above mentioned mechanism and make next [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) call.
[Back to TOC](#TOC)
# Consequences
- Existing `query_with_cursor` API (POST /api/search/v2/query_with_cursor) should be deprecated.
- New Pagination Query API using [search_after](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html) API on Elasticsearch should be introduced.
- New Delete Pagination Query Cursor API should be implemented.
- Search service tutorial should be updated with:
- New APIs documentation
- Introduction of a 'Best Practices' section with following suggestions:
- Migrate users from query_with_cursor API to new pagination-query API
- Remind users to call `DELETE /api/search/v2/pagination-query-cursor` API to avoid overloading system if cursor is no longer in use or next page is not needed.
[Back to TOC](#TOC)https://community.opengroup.org/osdu/platform/system/search-service/-/issues/156query_with_cursor quota exhausts too easily2024-02-22T09:58:58ZAn Ngoquery_with_cursor quota exhausts too easilyWhen making a few query_with_cursor request to search service, it was too easy to reach ES scroll contexts quota (500 scroll contexts).
curl -X POST \
'/search/v2/query_with_cursor' \
--header 'accept: */*' \
--header 'data-parti...When making a few query_with_cursor request to search service, it was too easy to reach ES scroll contexts quota (500 scroll contexts).
curl -X POST \
'/search/v2/query_with_cursor' \
--header 'accept: */*' \
--header 'data-partition-id: <partitionid>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data-raw '{"kind": "*:*:*:*", "limit": 1}'
The above request returns a 429 error code on the third call.
```
{
"code": 429,
"reason": "Too many requests",
"message": "Too many cursor requests, please re-try after some time."
}
```https://community.opengroup.org/osdu/platform/system/search-service/-/issues/154Align the Search Service Code Base with OSDU Platform Development Principles.2024-01-28T16:10:10ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comAlign the Search Service Code Base with OSDU Platform Development Principles.# ADR: Move Code Duplicates from CSP Modules to the Core Module.
Enhance Search service maintenance, align with the future ElasticSearch 8 migration, and minimize the effort needed for introducing Community implementation by reducing co...# ADR: Move Code Duplicates from CSP Modules to the Core Module.
Enhance Search service maintenance, align with the future ElasticSearch 8 migration, and minimize the effort needed for introducing Community implementation by reducing code duplication in CSPs modules.
## Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The Search service contains duplicated code for constructing Elasticsearch queries within CSP modules, in classes such as QueryBase.java and QueryServiceImpl.java. These redundancies add complexity to code maintenance without offering visible benefits. Query builders do not have CSP-specific code, additionally, differences have emerged in these classes over time:
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/provider/search-azure/src/main/java/org/opengroup/osdu/search/provider/azure/provider/impl/QueryBase.java
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/provider/search-aws/src/main/java/org/opengroup/osdu/search/provider/aws/provider/impl/QueryBase.java
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/provider/search-gc/src/main/java/org/opengroup/osdu/search/provider/gcp/provider/impl/QueryBase.java
https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/provider/search-ibm/src/main/java/org/opengroup/osdu/search/provider/ibm/provider/impl/QueryBase.java?ref_type=heads
## Decision
Identify delta in search service across different providers. Prioritize the most advanced version if significant differences exist and move it to the Core module. For instance, we previously migrated the optimized geo query builders from the Azure provider to the core: https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/556 Following the same principle, we can eliminate other existing duplicates.
## Rationale
Aside from the current increased cost and complexity of maintenance, we have at least two major tasks pending in the Search service:
- The migration to ElasticSearch 8 will require migrating all Elasticsearch query builders. Currently, the required effort will increase proportionally with the number of providers. https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/111
- For the Community Implementation of the Search service, selecting a version of Query Builders will be required. It could be a copy of the GC provider, which diverges from OSDU Development principles (like all providers currently), or a robust, reusable solution. https://gitlab.opengroup.org/osdu/pmc/community-implementation/-/issues/9
## Consequences
* Removal of code duplicates in provider modules.
* Introduction of a consolidated ElasticSearch query builder in the core module.
* Potential impact on features currently in development due to substantial codebase changes.
## Tradeoff Analysis
While this won't break API behavior, it could be seen as disruptive in the development process due to significant codebase changes. Tweaks and improvements made by CSPs to their modules might be overlooked during refactoring if not captured through integration testing.
## Alternatives and implications
An alternative to the current ADR involves relocating code duplicates to the Community Implementation module rather than the Core, designated for use in the Community implementation of the Search service. However, this would require developers to support five modules if new features are introduced or if migration to ElasticSearch 8 begins.M23 - Release 0.26Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/153LQCIndicator.1.0.0 Custom Reference Schema OSDU Search API Issue results in W...2024-01-09T12:13:57ZVedantha Sampekai SampekaiLQCIndicator.1.0.0 Custom Reference Schema OSDU Search API Issue results in WellLogTypeID field not appearing in search result where data is available at storageHello OSDU Forum Team,
I from Shell LQC OSDU Migration Project reaching out to you regarding an OSDU Search Issue facing with an LQC Custom Reference Schema/Data type. Jake from DP team (Jake.J.Pearce@shell.com) is working on this issue...Hello OSDU Forum Team,
I from Shell LQC OSDU Migration Project reaching out to you regarding an OSDU Search Issue facing with an LQC Custom Reference Schema/Data type. Jake from DP team (Jake.J.Pearce@shell.com) is working on this issue, he suggested us to raise this as an issue with OSDU Forum to get some assistance on this. I have explained the whole issue in step-by-step as below, hope it helps you in understanding it.
In case if need any more information this issue, I am happy to explain the issue in detail over a call. Please share me the required associate's names and your convenient time to setup a call. My Email-id: vedantha.gowda@shell.com
PFA LQCIndicator Schema JSON file for your reference.
[LQCIndicator_Custom_Reference_Schema.json](/uploads/26ecd1320e94666823cfb60a70ebb55b/LQCIndicator_Custom_Reference_Schema.json)
**Quick Summary of the issue**
Field "WellLogTypeID" displaying "null" for LQCIndicator:1.0.0 custom reference schema when results are returned for the kind via the Search API. The custom schema is referencing OSDU public data.
**Please find the below details regarding the issue,**
- LQCIndicator.1.0.0 is an LQC Custom Reference Schema/Data Type designed and approved from Shell Data Architect team.
- DP Team has ingested/registered the schema in all OSDU Environments – US Instance.
- LQC Team prepared the reference data as per the business requirement and handed over it to RDM Team in an Excel Sheet to ingest into LQCIndicator Schema.
- RDM Team has built an Informatica Pipeline, prepared an manifest payload and ingested the reference data into LQCIndicator Schema in OSDU Acceptance Environment US Instance.
- RDM team were using OSDU Manifest Ingestion API Service for ingesting the LQCIndicator Reference data into OSDU through Informatica pipeline.
- LQCIndicator has total 42 records and all of them successfully ingested into OSDU Acc Env, but for only 8 records the WellLogTypeID field is appearing as NULL, even though expected data is available in storage.
- Out of 42 LQCIndicator records 8 records are referencing to LOGTYPE:Interpreted and rest 34 records referencing to ConveyanceMethod:LoggingWhileDrilling and ConveyanceMethod:ElectricWirelineConveyed. All of the records referencing LogType:Interpreted are displaying "null" in the WellLogTypeID field. The other referenced fields (for ConveyanceMethod:LoggingWhileDrilling and ConveyanceMethod:ElectricWirelineConveyed) are displaying values as they should.
- When we queried the LQCIndicator schema kind through OSDU Search API service the WellLogTypeID field is showing NULL
- When we queried the LQCIndicator schema kind through OSDU Storage API Service the WellLogTypeID field is showing the correct data value - ("WellLogTypeID": "osdu:reference-data--LogType:Interpreted:").
- The 8 records (id) where we are facing WellLogTypeID field NULL issue are listed at the bottom.
- The WellLogTypeID & ConveyanceMethodID both are similar fields as per schema design. Both are referencing to OSDU Forum Schema’s. ConveyanceMethod referencing to schema – “osdu:wks:reference-data--ConveyanceMethod:1.0.0” and WellLogTypeID referencing to schema “osdu:wks:reference-data--LogType:1.0.0”.
- ConveyanceMethodID attribute works off the same referencing mechanism as WellLogTypeID, however this field is displaying values properly when obtained through the Search API - "ConveyanceMethodID": "osdu:reference-data--ConveyanceMethod:LoggingWhileDrilling:", as per the business requirement (The rest 34 LQCIndicator records).
- A new OSDU environment was created at Shell and we have successfully re-created the LQCIndicator Schema WellLogTypeID field NULL Issue in the new OSDU Environment.
- As suggested by DP team we are raising this issue to OSDU Forum and requesting you to help us on resolving this issue.
- I have attached the LQC Custom Reference Schema for your reference.
- I am happy to supply any further information as required
**Screenshot's of WellLogTypeID field NULL issue in LQCIndicator.1.0.0 Schema/Data type**
Queried through - OSDU Search API Service:
![image](/uploads/7a1863246448f997a3ba12dd8a318bdd/image.png)
Queried through - OSDU Storage API Service:
![image](/uploads/bfc4e4ab14fe38bef9962ec6be9262bc/image.png)
**Below is Search Query to find it in AWS@shell OSDU Acceptance Env - US Instance**
{
"kind": "shell:wks:reference-data--LQCIndicator:1.0.0",
"returnedFields": ["id", "data.WellLogTypeID"]
}
**Below are the 8 - LQCIndicator id’s out of 42 the WellLogTypeID data NULL issue** (for the rest 34 LQCIndicator – id’s the WellLogTypeID should be NULL, where ConveyanceMethodID attribute will be populated)
osdu:reference-data--LQCIndicator:24
osdu:reference-data--LQCIndicator:25
osdu:reference-data--LQCIndicator:26
osdu:reference-data--LQCIndicator:27
osdu:reference-data--LQCIndicator:28
osdu:reference-data--LQCIndicator:29
osdu:reference-data--LQCIndicator:30
osdu:reference-data--LQCIndicator:31
Looking forward for your guidance and help in resolving this issue. Thank youhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/152ADR: Ability to get all the records of a given Persisted Collection from search2024-01-11T09:20:16ZJuilee PaluskarADR: Ability to get all the records of a given Persisted Collection from search## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
A persisted collection can aggregate objects of different nature including master data, work-product-component, reference data....## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
A persisted collection can aggregate objects of different nature including master data, work-product-component, reference data. It could contain collection of records of heterogenous kind. At a given point, MemberIDs field of PersistedCollection maintains list of objects which are part of the collection.
More can be read from this [schema](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/work-product-component/PersistedCollection.1.2.0.json) .
Problem :
Today, there is no way to get all the records which belongs to a particular Persisted Collection. Today, user has to perform atleast 2 search queries to get the records of a Persisted Collection.
1st Query - To get the Persisted Collection record and retrieve record IDs are from MemeberID field.
2nd Query - To get the actual record from retrieved record Ids in the 1st query.
For the 2nd query, to get multiple records in 1 search query, user has to form a query with OR operator. E.g.
{
"Query" : recordId-1 **OR** recordId-2 **OR** recordId-3 .. recordId-1000.
}
Here ElasticSearch has limitation of max usage of **OR** conditions in 1 query.
So if a PersistedCollection contains more than 1000 records , user has to invoke multiple search queries to get all the records.
## Possible Solution
One of the possible solution to address this requirement could be adding Persisted Collection record id in the record's data. So whenever records get added to the Persisted Collection , record's data should be updated with the information of Persisted Collection id.
This could be done by listening to record change event for PersistedCollection kind .
## Consequences
* This will help users to get the records of Persisted Collection in a single go.
* This will help users to get the records to which he/she has access.
* This will help users to form queries to get desire records from PeristedCollection such as “Give all the records of a persisted collection where data.\<someproperty\> is \<xyz\>“ in one go.
* This will help users to make filters based on different objects in the collection.https://community.opengroup.org/osdu/platform/system/search-service/-/issues/142Search service does not return all null objects when the key object datatype ...2023-12-20T10:39:59ZNaufal Mohamed NooriSearch service does not return all null objects when the key object datatype has value not equal to stringSearch service should return all object key and return them null if the object is not populated during ingestion/storage insertion. However, we found that if keys have non-string value type from schema definition it will return null from...Search service should return all object key and return them null if the object is not populated during ingestion/storage insertion. However, we found that if keys have non-string value type from schema definition it will return null from the search - as opposed to when the object has value type string where non populated key will return 'null'
Example, I have ingested the following payload in R3M20 AWS preship:
`{
"runId": "{{$guid}}",
"executionContext": {
"acl": {
"viewers": [
"data.default.viewers@osdu.example.com"
],
"owners": [
"data.default.owners@osdu.example.com"
]
},
"legal": {
"legaltags": [
"osdu-public-usa-dataset"
],
"otherRelevantDataCountries": [
"US"
]
},
"Payload": {
"AppKey": "test-app",
"data-partition-id": "osdu"
},
"manifest": {
"kind": "osdu:wks:Manifest:1.0.0",
"Data": {
"WorkProductComponents": [
{
"id": "osdu:work-product-component--SeismicTraceData:TEST_ISSUE_1",
"kind": "osdu:wks:work-product-component--SeismicTraceData:1.4.0",
"acl": {
"viewers": [
"data.default.viewers@osdu.example.com"
],
"owners": [
"data.default.owners@osdu.example.com"
]
},
"legal": {
"legaltags": [
"osdu-public-usa-dataset"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name": "TESTT_ISSUE",
"StartTime": 0,
"EndTime": 10000
},
"meta": [
{
"kind": "Unit",
"name": "ms",
"persistableReference": "{\"abcd\":{\"a\":0.0,\"b\":0.001,\"c\":1.0,\"d\":0.0},\"symbol\":\"ms\",\"baseMeasurement\":{\"ancestry\":\"T\",\"type\":\"UM\"},\"type\":\"UAD\"}",
"propertyNames": [
"StartTime",
"EndTime",
"SampleCount"
],
"unitOfMeasureID": "osdu:reference-data--UnitOfMeasure:ms:"
}
]
}
]
}
}
}
}`
During search query, the data.SampleCount and others non-string keys are not shown up from search return. Look at the sample below traceDomainUoM (type: string) is visible as null but the traceLength (type: number) is not visible:
`{
"results": [
{
"data": {
"SpatialArea.QuantitativeAccuracyBandID": null,
"VirtualProperties.DefaultLocation.QuantitativeAccuracyBandID": null,
"LiveTraceOutline.CoordinateQualityCheckPerformedBy": null,
"SpatialArea.SpatialParameterTypeID": null,
"Difference": null,
"ResourceCurationStatus": null,
"SpatialArea.SpatialGeometryTypeID": null,
"SortOrderID": null,
"IsExtendedLoad": null,
"Name": "TESTT_ISSUE",
"SeismicFilteringTypeID": null,
"VirtualProperties.DefaultName": "TESTT_ISSUE",
"VirtualProperties.DefaultLocation.CoordinateQualityCheckPerformedBy": null,
"ResourceSecurityClassification": null,
"VerticalMeasurementTypeID": null,
"SeismicStackingTypeID": null,
"ExistenceKind": null,
"ProcessingProjectID": null,
"SeismicDomainTypeID": null,
"Preferred2DInterpretationSetID": null,
"HorizontalCRSID": null,
"SeismicAttributeTypeID": null,
"BinGridID": null,
"SeismicProcessingStageTypeID": null,
"StartTime": 0.0,
"LiveTraceOutline.SpatialParameterTypeID": null,
"SpatialArea.QualitativeSpatialAccuracyTypeID": null,
"SpatialPoint.SpatialGeometryTypeID": null,
"LiveTraceOutline.QuantitativeAccuracyBandID": null,
"LiveTraceOutline.SpatialGeometryTypeID": null,
"IsDiscoverable": null,
"SeismicWaveTypeID": null,
"Precision.WordFormat": null,
"VirtualProperties.DefaultLocation.QualitativeSpatialAccuracyTypeID": null,
"SubmitterName": null,
"TraceDomainUOM": null,
"SeismicPhaseID": null,
"SpatialPoint.QualitativeSpatialAccuracyTypeID": null,
"PrincipalAcquisitionProjectID": null,
"GatherTypeID": null,
"Description": null,
"Phase": null,
"EndTime": 10.0,
"TimeLapse.TimeSeriesID": null,
"ResourceLifecycleStatus": null,
"SeismicTraceDataDimensionalityTypeID": null,
"TechnicalAssuranceID": null,
"VirtualProperties.DefaultLocation.SpatialGeometryTypeID": null,
"Source": null,
"SeismicLineGeometryID": null,
"LiveTraceOutline.QualitativeSpatialAccuracyTypeID": null,
"SpatialPoint.CoordinateQualityCheckPerformedBy": null,
"TraceRelationFileID": null,
"Polarity": null,
"SpatialPoint.SpatialParameterTypeID": null,
"SpatialPoint.QuantitativeAccuracyBandID": null,
"SpatialArea.CoordinateQualityCheckPerformedBy": null,
"SeismicPolarityID": null,
"Seismic2DName": null,
"VirtualProperties.DefaultLocation.SpatialParameterTypeID": null,
"ResourceHomeRegionID": null,
"Preferred3DInterpretationSetID": null,
"SeismicMigrationTypeID": null
},
"kind": "osdu:wks:work-product-component--SeismicTraceData:1.4.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.example.com"
],
"owners": [
"data.default.owners@osdu.example.com"
]
},
"type": "work-product-component--SeismicTraceData",
"version": 1703067759404129,
"tags": {
"normalizedKind": "osdu:wks:work-product-component--SeismicTraceData:1"
},
"modifyUser": "admin-main@testing.com",
"modifyTime": "2023-12-20T10:22:40.387Z",
"createTime": "2023-12-20T10:17:21.256Z",
"authority": "osdu",
"namespace": "osdu:wks",
"legal": {
"legaltags": [
"osdu-public-usa-dataset"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "serviceprincipal-main@testing.com",
"id": "osdu:work-product-component--SeismicTraceData:TEST_ISSUE_1"
}
],
"aggregations": null,
"totalCount": 1
}`
We do hope the search index will return consistently regardless the type of keys input either string, boolean, number or integer.
cc @debasiscM20 - Release 0.23https://community.opengroup.org/osdu/platform/system/search-service/-/issues/141What is the best way to figure out "unit of measure" of any specific field fr...2023-12-13T06:49:18ZDebasis ChatterjeeWhat is the best way to figure out "unit of measure" of any specific field from Search response?Please see this test case from an earlier release.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M16/Test_Plan_Results_M16/Manifest_Ingestion/M16-AWS-Manifest-Ingestion-Unit-convert-Debasis.txt
Here field Cab...Please see this test case from an earlier release.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M16/Test_Plan_Results_M16/Manifest_Ingestion/M16-AWS-Manifest-Ingestion-Unit-convert-Debasis.txt
Here field CableLength value was 6000 ft. in Ingestion payload (JSON file).
We added this field in meta block and wanted conversion from ft to meter.
Hence Search response shows correctly converted value - 1828.8 meters. (as expected).
This becomes clear when we see initial JSON payload by using Storage service and later when we retrieve the record by using Search service.
But the question is - if someone looks at Search service alone, what is the clue?
- To know that a field has undergone conversion?
- And to know what it has been converted to and what is the matching unit of measure for its current value?
Checking Schema service, I can find out this field is using "length" unit of measure.
[schema-SAS.txt](/uploads/6552b357ddc855c7120e3abc63d7c155/schema-SAS.txt)
```
"CableLength": {
"description": "Total length of receiver array",
"x-osdu-frame-of-reference": "UOM:length",
"type": "number"
},
```
From Reference data UnitofMeasure, I can also find out Base unit of measure for "length" is "metre".
IsBaseUnit=true. UnitDimensioName="length"
From Search, I may include the field "index" to get more information.
That trick is handy when something goes wrong with indexing.
But otherwise it shows status=200.
How do we find out if the user opted to leave values in original unit (foot) and did not bother to convert to SI unit (meter)?
Copying to Mark Chance ( @Java1Guy ) as he is currently working on Search service enhancements
Also copying to @nthakur and @gehrmann for their inputs.https://community.opengroup.org/osdu/platform/system/search-service/-/issues/140Tutorial: Search by kind Guidance2024-01-08T12:17:56ZThomas Gehrmann [slb]Tutorial: Search by kind Guidance# [Query by kind](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md?ref_type=heads#query-by-kind)
The tutorial promotes searching for specific **_versions_** of schemas, whi...# [Query by kind](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md?ref_type=heads#query-by-kind)
The tutorial promotes searching for specific **_versions_** of schemas, which is not a good idea. In recent milestones the number of minor schema versions as well as patch versions have grown considerably.
* The tutorial should recommend wildcards for minor and patch versions.
* Using specific schema versions in query by `kind` will cause serious trouble when data records are schema-migrated, updated or newly ingested using, e.g., the using the [preferred schema version recommendation (Schema Usage Guide)](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Guides/Chapters/93-OSDU-Schemas.md#appendix-d5-schema-version-managementconfiguration).
CC @nthakur, @chadM23 - Release 0.26Thomas Gehrmann [slb]Thomas Gehrmann [slb]https://community.opengroup.org/osdu/platform/system/search-service/-/issues/138Search APIs don't return content-type in response headers2023-11-08T14:04:16ZShane HutchinsSearch APIs don't return content-type in response headersMinor issue:
Most search APIs don't return the Content-Type in the response headers
If does return it in doesn't match or is missing in the openapi.jsonMinor issue:
Most search APIs don't return the Content-Type in the response headers
If does return it in doesn't match or is missing in the openapi.jsonhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/137Search with offset returns duplicates2023-10-05T10:18:24ZBert KampesSearch with offset returns duplicatesSee attached [Search_query_API_issue.docx](/uploads/976b5aca3cdd625f1fa59822b6319415/Search_query_API_issue.docx).
(from attachment)
Search query API issue:
Search query API is having a limit of retrieving only 1000 records. To retrieve...See attached [Search_query_API_issue.docx](/uploads/976b5aca3cdd625f1fa59822b6319415/Search_query_API_issue.docx).
(from attachment)
Search query API issue:
Search query API is having a limit of retrieving only 1000 records. To retrieve the all the records from the DB we need to call the search API multiple times until we get all the records.
When Search API is invoked for the first time that is for first iteration it will get 1000 records.
For the second iteration we will pass the offset value. When we are passing the offset value, we are indicating it to retrieve the data from that point.
And the iteration goes on until all the records are retrieved. And the offset value is increased for every subsequent iteration.
But we observed that the Search query API is not working as intended to work with the offset value. The order of retrieval of records is not maintained.
It is retrieving the records which are already returned. So this is giving the duplicate records.
For Example, below when we are trying to retrieve the CRS data using the Search query API...https://community.opengroup.org/osdu/platform/system/search-service/-/issues/136OpenAPI documentation should specify array of string or string instead of typ...2023-10-05T10:53:40ZHåkon TønnessenOpenAPI documentation should specify array of string or string instead of typeless schema for `kind`Current documentation does not specify the possible types for the property `kind` in `CursorQueryRequest`and `QueryRequests`:
Currently:
```
CursorQueryRequest:
description: Json object to query the Search API
type: object
...Current documentation does not specify the possible types for the property `kind` in `CursorQueryRequest`and `QueryRequests`:
Currently:
```
CursorQueryRequest:
description: Json object to query the Search API
type: object
required:
- kind
properties:
cursor:
.....
kind:
type: object
description: The kind of the record to query e.g. "tenant1:test:well:1.0.0" or "tenant1:test:well:1.0.0,tenant1:test:well:2.0.0" or ["tenant1:test:well:1.0.0", "tenant1:test:well:2.0.0"].
```
This causes issue when creating data models based on the OpenAPI documentation, as the typeless schemas will be interpreted as an dictionary type, but description specifies that both string and array of strings are valid parameters.
By specifying types correctly using `oneOf` instead, gives a correct specification:
```
CursorQueryRequest:
description: Json object to query the Search API
type: object
required:
- kind
properties:
cursor:
.....
kind:
type: object
additionalProperties:
oneOf:
- type: string
- type: array
items:
type: string
description: The kind of the record to query e.g. "tenant1:test:well:1.0.0" or "tenant1:test:well:1.0.0,tenant1:test:well:2.0.0" or ["tenant1:test:well:1.0.0", "tenant1:test:well:2.0.0"].
```
This makes the models unambiguous.
Affects `CursorQueryRequest`and `QueryRequests`, as these needs the `kind` parameterhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/135ADR Provide suggestions for auto-complete of input2024-01-15T11:56:08ZMark ChanceADR Provide suggestions for auto-complete of input# ADR: Autocomplete
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
Shell application developer stakeholders want to provide to their users the functi...# ADR: Autocomplete
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
Shell application developer stakeholders want to provide to their users the functionality to provide auto-complete suggestions based on partial input.
# Context & Scope
Based on words occurring in OSDU platform records, a comparison is made to all text tokens occurring in all fields of a record. For this case we propose using bagOfWords described in indexer [ADR](https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/113)
[Back to TOC](#TOC)
## Requirements
The partial input is passed to the search service and a list of suggestions is returned.
To be useful, the response time must be under 2 seconds.
[Back to TOC](#TOC)
# Tradeoff Analysis
[Back to TOC](#TOC)
# Proposed solution
The search query json will support this syntax:
```json
{
"suggestPhrase": "united"
}
```
Which would return something of the form:
```json
{
"phraseSuggestions": [
"United States",
"United States therm",
"United Kingdom",
"United Kingdom British thermal unit",
"United Kingdom term",
"United Kingdom nautical mile",
]
}
```
[Back to TOC](#TOC)
# Change Management
* Operators may need to execute reindex with force_clean=true action on indices to enable this feature.
# Decision
# Consequences
* The search code changes will not impact any existing queries or functionality since this is a new field.
[Back to TOC](#TOC)
#EOF.M23 - Release 0.26Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/134Search should not return 404 in case there are no matching data in Elasticsearch2023-11-08T14:07:37ZDenis Karpenok (EPAM)Search should not return 404 in case there are no matching data in Elasticsearch**The expected result:**
- When no data matches the query response is 200 OK with an empty list.
**Actual results are:**
- Inconsistent, sometimes it's 200 OK sometimes it's 400.
**Reason:**
- Not all requests to ElasticSearch have...**The expected result:**
- When no data matches the query response is 200 OK with an empty list.
**Actual results are:**
- Inconsistent, sometimes it's 200 OK sometimes it's 400.
**Reason:**
- Not all requests to ElasticSearch have parameters to ignore user errors, usually, those are preliminary requests to get details for further search queries, for example: https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/service/FieldMappingTypeService.java#L49
**Solution:**
- Suppress all 400 errors from Elasticsearch and respond to the end user only with 200 OK.
**Pros:**
- More consistent workflow for client applications.
- Reduced error handling for client applications.
More details are in the attached CSV files:
[test_results_2023-08-29_11-34-31.csv](/uploads/03bf18c852387f4da493aa13b97ad5d3/test_results_2023-08-29_11-34-31.csv)
[test_results_2023-08-29_11-51-20.csv](/uploads/6071b35ea688e57bdf24112198a9ddd7/test_results_2023-08-29_11-51-20.csv)https://community.opengroup.org/osdu/platform/system/search-service/-/issues/133Elasticsearch licensing2024-01-18T07:50:02ZChad LeongElasticsearch licensing# Problem Statement
Currently, Search service is using Elasticsearch [7.8.1](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/pom.xml?ref_type=heads#L33). There is a need to upgrade the version to provid...# Problem Statement
Currently, Search service is using Elasticsearch [7.8.1](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/pom.xml?ref_type=heads#L33). There is a need to upgrade the version to provide stability, features and performance improvement.
Specifically following the release of version 7.10.2 https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-core/7.10.2 , Elastic has since transitioned its licensing from the Apache 2.0 license to the Server Side Public License (SSPL) for any future versions https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-core.
OSDU software needs to be licensed using Apache 2.0. This change has raised concerns, particularly regarding compatibility issues with client bindings and providing future updates to Elasticsearch.
## Impact
There are 2 components to the search - Elastic client bindings and server.
- Client bindings are integral components of applications that facilitate seamless communication with our Elastic Search Service. These bindings have traditionally been Apache 2.0 compatible. The shift to SSPL raises compatibility concerns, potentially preventing the upgrade of client bindings.
- The Elastic server itself is used as a tool, so we don’t need to worry about Apache compatibility. Server-side upgrades are possible but may encounter a future technical barrier without client-binding upgrades.
## Objective
We need to address this licensing challenge and find an alternative that allows for a smooth transition. We are actively exploring options for an elastic alternative that can bridge the gap between client bindings and server upgrades.
**An option №1** is https://opensearch.org/docs/latest/clients/java/
- https://aws.amazon.com/blogs/opensource/keeping-clients-of-opensearch-and-elasticsearch-compatible-with-open-source/
Pros:
- OpenSearch is an ElasticSearch fork, and fully compatible with v 7.10 see https://opensearch.org/faq/#q1.8. Thus refactoring should be more or less straightforward.
- Easier to preserve existing features.
- It's possible to change clients in Services and keep ElastSearch as a backend server.
Cons:
- Following-up releases do not guarantee compatibility with ElasticSearch API: https://opensearch.org/faq/#q1.9
Action items:
- Potentially could bind CSPs to ElasticSearch server v 7.10 or force them to switch to OpenSearch server.
- Switch Indexer and Search to use OpenSearch clients.
**Option №2** is an Elasticsearch client with an Apache license https://github.com/elastic/elasticsearch-java/
Pros:
- Possible to keep Elasticsearch as a backend.
- Later we could migrate to Elasticsearch 8.
Cons:
- Could require a bit more thorough migration for Search and Indexer, unlike OpenSearch. Since it's a different lib with different interfaces, we may need to rewrite a lot of code. In the meantime, OpenSearch has a fork of High-level-rest-client https://opensearch.org/docs/latest/clients/java-rest-high-level/ which could simplify migration to just swapping imports.
- Additionally, we should be aware that the Elasticsearch server's licensing could still pose an issue.
Action items:
- Migrate Indexer and Search to use Elasticsearch Apache client.
## Decisions
Option 2 seems to be a better long-term solution with the possibility of keeping Elasticsearch as a backend. A separate migration strategy has been written here https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/111https://community.opengroup.org/osdu/platform/system/search-service/-/issues/132Follow-up from "Add filter to nested sort"2023-08-17T16:07:05ZMark ChanceFollow-up from "Add filter to nested sort"The following discussion from !535 should be addressed:
- [ ] @nthakur started a [discussion](https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/535#note_241458): (+2 comments)
> Will filter conte...The following discussion from !535 should be addressed:
- [ ] @nthakur started a [discussion](https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/535#note_241458): (+2 comments)
> Will filter context work for non-nested scenario? If it does, can you please update non-nested section as well? If it does not, then we should add this in limitation documentation.M20 - Release 0.23Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/131ADR: Exclude indices of the system/meta data from the search results unless t...2023-09-14T19:45:45ZZhibin MaiADR: Exclude indices of the system/meta data from the search results unless the indices (kinds) of the system/meta data are explicitly specified in the search queryIt is mostly like that the applications or systems may need to have its system/meta data searchable via OSDU search but those system/meta data are not expected to be included in the search results of normal keyword search,
for example, ...It is mostly like that the applications or systems may need to have its system/meta data searchable via OSDU search but those system/meta data are not expected to be included in the search results of normal keyword search,
for example, an application stores its system data in the storage under kind "xyz" _(please ignore the kind syntax in this example)_
- When users try to search data with keyword "wellbore", the data from kind "xyz" should not be included in the search result if users do search as below:
##### Case 1:
```
{
"kind": "*:*:*:*",
"query": "wellbore"
}
```
- When application (workflow) tries to search its system data with keyword "wellbore", the data from kind "xyz" should be included in the search result if the kind "xyz" is explicitly specified in the search query, e.g.
##### Case 2:
```
{
"kind": "xyz",
"query": "wellbore"
}
```
To achieve this objective and provide a general solution, we propose to use a reserved name in the "authority" or "source" field for kinds of the system/metadata.
- If those kinds are not explicitly specified in the search query as the **Case 1** above, the data from those kinds won't be included in the search result
- If those kinds are explicitly specified in the search query as the **Case 2** above, the data from those kinds will be included in the search result
The reserved name should be meaningful and odd (weird) enough to avoid naming conflict with the existing schema. It is an open question what it should be. Here a few proposals about the reserved name:
- "system" -- it may be too common
- "system-meta"
- "system-meta-data" -- should not be common if it is used in as "authority"
Whether the reserved name in "authority" or "source" is another open question. Here is what we think:
| Field | Pro | Con |
|:--------------|:------------------------------------------------------|:----------------------------------------------------------------|
| authority | it can be precisely filtered those indices | it could cause name conflict among tenants in multi-tenants env when they share the same services |
| source | it should not cause name conflict among tenants in multi-tenants env if each tenant has its own authority for its kinds | it could be impossible precisely filtered those indices. If the entity type field has the same keyword, those indices will be filtered out too |
Any input is welcomed before finalizing the solution.
Once we have a conclusion, Thomas will include this reserved keyword in the schema guide.M20 - Release 0.23Thomas Gehrmann [slb]Zhibin MaiThomas Gehrmann [slb]https://community.opengroup.org/osdu/platform/system/search-service/-/issues/130Search Service does not work with cursors2024-02-22T09:58:57ZRiabokon Stanislav(EPAM)[GCP]Search Service does not work with cursorsThis issue was observed when the GC team was running various requests on Search Service.
for example,
```
curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query_with_cursor' \
--header 'Content-Type: appl...This issue was observed when the GC team was running various requests on Search Service.
for example,
```
curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query_with_cursor' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: osdu' \
--header 'accept: application/json' \
--header 'Authorization: Bearer ey' \
--data '{
"kind": "*:*:*:*",
"query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
"trackTotalCount": true
}'
```
with an answer
```
{
"cursor": null,
"results": [],
"totalCount": 0
}
```
Investigation:
**Search Service** will create a request on ElasticSearch:
`POST /*-*-*-*,-.*/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=90s&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true`
A parameter **scroll = 90s** means we will use Scroll API to use a cursor with **a time-life = 90 seconds**.
However, we will create a cursor for every indice and can get an error for an indice:
`Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.`
After a while, we decided to investigate this issue deeper.
When we check node stats with the next request:
https://gc_elastic_search:9243/_nodes/stats/indices/search
answer was
```
"indices": {
"search": {
"open_contexts": 1,
"query_total": 15271488,
"query_time_in_millis": 9385974,
"query_current": 0,
"fetch_total": 9567767,
"fetch_time_in_millis": 590770,
"fetch_current": 0,
"scroll_total": 4399252,
"scroll_time_in_millis": 7768131243,
"scroll_current": 1,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
}
}
```
As we can see, ElasticSearch has **1 scroll_current**.
Let's run our request again
```
{
"kind": "*:*:*:*",
"query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
"trackTotalCount": true
}
```
The answer from node stats was
```
"indices": {
"search": {
"open_contexts": 1193,
"query_total": 15272901,
"query_time_in_millis": 9386238,
"query_current": 0,
"fetch_total": 9567932,
"fetch_time_in_millis": 590779,
"fetch_current": 0,
"scroll_total": 4399329,
"scroll_time_in_millis": 7768231132,
"scroll_current": 1193,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
}
}
```
We will get **"scroll_current": 1193**. Thus, thanks to our request, we will create 1193 cursors with the same ID for every indice.
Solution:
- try to avoid such requests when we want to use search_with_cursors.
- According with an official Elasticsearch documentation, we have to use
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/paginate-search-results.html#search-after instead of Scroll API.
`We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).`
More details: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html.
In our case, we have to maintain **search_after** parameter in Search Service.
- Run request
```
curl -i -X PUT \
-H "Authorization:Basic ****" \
-H "data-partition-id:osdu" \
-H "Content-Type:application/json" \
-d \
'\{"persistent" : { "search.max_open_scroll_context": 5000 }
,
"transient":
{ "search.max_open_scroll_context": 5000 }
}' \
'https://elastic_search:9243/_cluster/settings'
```
to increase max_open_scroll_context.https://community.opengroup.org/osdu/platform/system/search-service/-/issues/129Extend "sort on nested text fields" feature with possibility to add nested fi...2023-10-26T12:21:48ZMark ChanceExtend "sort on nested text fields" feature with possibility to add nested filter## Change existing nested sorting signature
From:
nested(path, field, mode)
To:
nested(path, field, mode, _nested_filter_) - (filter argument may be an optional argument to maintain backwards compatibility)
to allow user specify fil...## Change existing nested sorting signature
From:
nested(path, field, mode)
To:
nested(path, field, mode, _nested_filter_) - (filter argument may be an optional argument to maintain backwards compatibility)
to allow user specify filter that the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting (sounds complex, but this is an exact definition in elasticsearch docs)
## Example use case which is enabled by this feature
Below is the data model where we need to sort based on the AliasName where AliasNameTypeID= "osdu:reference-data--AliasNameType:UWI:"
```json
{
"data":
"NameAliases": [
{
"AliasName": "714100044935",
"TerminationDateTime": "2020-02-13T09:13:15.55+0000",
"AliasNameTypeID": "osdu:reference-data--AliasNameType:UWI:",
"EffectiveDateTime": "2020-02-13T09:13:15.55+0000",
"DefinitionOrganisationID": null
}
]
}
```
## Proposed syntax for new argument
The same syntax as 'query' top level endpoint argument. Alternatively, raw elasticsearch syntax for filter can be used.
Example request that will accomplish desired functionality after this implementation:
```json
{
"kind": "{{data-partition-id}}:*:*:*",
"sort": {
"field": [
"nested(data.NameAliases, AliasName, min, (AliasNameTypeID:\"osdu:reference-data--AliasNameType:UWI:\"))"
],
"order": [
"ASC"
]
}
}
```
## Scope of required work
Changes only in Search Service, passing new argument from API layer to SortParserUtil that has to be modified. https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/util/SortParserUtil.java#L118
This is originally requested as https://osdu-community.ideas.aha.io/ideas/IDEA-I-66M20 - Release 0.23Mark ChanceMark Chance