Skip to content

Pagination query with cursor

Zhibin Mai requested to merge pagination_query_with_cursor into master

Type of change

  • Bug Fix
  • Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)
ADR Pagination Query API

Does this introduce a change in the core logic?

  • [YES]

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Azure
  • Google Cloud
  • IBM

Does this introduce a breaking change?

  • [NO]

What is the current behavior?

What is the new/expected behavior?

The new feature (using elasticsearch search_after to support pagination) will have the same behavior as the query with cursor in terms of API but it resolves the max. cursor limitation and is a light-weight solution to support pagination

Have you added/updated Unit Tests and Integration Tests?

As the new feature will have the same behavior as the query with cursor in terms of API, the integration tests for query with cursor should work with the new feature. So we duplicate the features and steps from the query with the cursor to the new feature as its integration tests. The only change is the namespace as well as the endpoint.

Any other useful information

  1. The new feature (implemented with search_after and PIT) works as the query with cursor as confirmed with local functional tests and integration tests. ElasticSearch document describes that using elasticsearch search_after to support pagination is a light-weight solution and is recommended instead of using query with cursor.
  2. We tested that the fully support of the elasticsearch search_after in the ElasticSearch high-level API starts with 7.14.1. We tried to upgrade the lib to the latest version (7.17.22) as the indexer but some critical uitlities (e.g. org.elasticsearch.common.geo.builders.*) are not supported from 7.16.1 and above. To avoid mixing two different issues in one MR, we upgraded the ElasticSearch high-level API to 7.15.2
  3. The new feature uses Point In Time (PIT) to maintain the consistence of the search result, which is similar like query with cursor to create a snapshot of the indices, PIT is set with keep_alive value to allow the ElasticSearch to release the resources after it is timeout. There is one observation and one enhancement as comparing to query with cursor solution:
  • The keep_alive value must be multiple of 1 minute. Though it can be used set in different unit, the converted value must be multiple of minute. Otherwise, it will throw exception with message like [1:1639] [pit] failed to parse field [keep_alive].
  • The new feature will automatically delete (close) the PIT immediately when it detects that the current page is the last page. This will help release the resources early. It uses cached information to simulate the same behavior as query with cursor in this case.
  1. Though the new feature works as query with cursor and its solution is recommended by ElasticSearch, we still don't know whether it has big side effect. I recommended to mark this new feature as preview feature (or beta feature) in M24 and let clients and service providers do more tests, especially performance tests and stressful tests. If there is no big side effect, we can make it as public/official release feature and document it.
Edited by Zhibin Mai

Merge request reports