Search Service does not work with cursors

This issue was observed when the GC team was running various requests on Search Service. for example,

curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query_with_cursor' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: osdu' \
--header 'accept: application/json' \
--header 'Authorization: Bearer ey' \
--data '{
    "kind":  "*:*:*:*",
    "query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
    "trackTotalCount": true
}'

with an answer

{
    "cursor": null,
    "results": [],
    "totalCount": 0
}

Investigation:

Search Service will create a request on ElasticSearch:

POST /*-*-*-*,-.*/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=90s&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true

A parameter scroll = 90s means we will use Scroll API to use a cursor with a time-life = 90 seconds. However, we will create a cursor for every indice and can get an error for an indice:

Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.

After a while, we decided to investigate this issue deeper. When we check node stats with the next request:

https://gc_elastic_search:9243/_nodes/stats/indices/search

answer was

"indices": {
                "search": {
                    "open_contexts": 1,
                    "query_total": 15271488,
                    "query_time_in_millis": 9385974,
                    "query_current": 0,
                    "fetch_total": 9567767,
                    "fetch_time_in_millis": 590770,
                    "fetch_current": 0,
                    "scroll_total": 4399252,
                    "scroll_time_in_millis": 7768131243,
                    "scroll_current": 1,
                    "suggest_total": 0,
                    "suggest_time_in_millis": 0,
                    "suggest_current": 0
                }
            }

As we can see, ElasticSearch has 1 scroll_current. Let's run our request again

{
  "kind":  "*:*:*:*",
  "query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
  "trackTotalCount": true
}

The answer from node stats was

"indices": {
                "search": {
                    "open_contexts": 1193,
                    "query_total": 15272901,
                    "query_time_in_millis": 9386238,
                    "query_current": 0,
                    "fetch_total": 9567932,
                    "fetch_time_in_millis": 590779,
                    "fetch_current": 0,
                    "scroll_total": 4399329,
                    "scroll_time_in_millis": 7768231132,
                    "scroll_current": 1193,
                    "suggest_total": 0,
                    "suggest_time_in_millis": 0,
                    "suggest_current": 0
                }
            }

We will get "scroll_current": 1193. Thus, thanks to our request, we will create 1193 cursors with the same ID for every indice.

Solution:

try to avoid such requests when we want to use search_with_cursors.
According with an official Elasticsearch documentation, we have to use https://www.elastic.co/guide/en/elasticsearch/reference/7.17/paginate-search-results.html#search-after instead of Scroll API.

We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).

More details: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html. In our case, we have to maintain search_after parameter in Search Service.

Run request

curl -i -X PUT \
-H "Authorization:Basic ****" \
-H "data-partition-id:osdu" \
-H "Content-Type:application/json" \
-d \
'\{"persistent" : { "search.max_open_scroll_context": 5000 }
,
"transient":
{ "search.max_open_scroll_context": 5000 }
}' \
'https://elastic_search:9243/_cluster/settings'

to increase max_open_scroll_context.

Edited Jun 21, 2023 by Riabokon Stanislav(EPAM)[GCP]