ADR: Pagination Query API

Status
Background
Context & Scope
Proposed solution
- Implementation details on Pagination Query API
Consequences

Status

Background

Paginating over large query result is a common discovery workflow. Search service query API can return maximum 10K records, anything higher then this requires usage of Search service's query_with_cursor API (POST /api/search/v2/query_with_cursor). As OSDU Data Platform adoption has increased over milestone releases, users have repeatedly complained (Issues: 130, 156 etc.) on Search service's query_with_cursor API reliability & performance. Some of the most common issues reported:

During deep pagination over large result-set, API may throw error in the middle & users have to start over. It can be very time consuming and costly exercise.
By default, each data-partition can have maximum 500 active cursors, if this limit is reached then API throws an exception. Users have repeatedly complained that even with light usage, this quota gets exhausted and they cannot make new cursor API call.
Cursor count per Search service request calculation is opaque. One Search service cursor request can potentially consume lot of cursors on the Search backend (Elasticsearch). It's very hard to provide users any guidance, how many concurrent cursor requests can be made on a data-partition.
Cursor quota is a soft limit and can be potentially increased to mitigate issue. Quota increase will have impact on Search backend resource usage which can then degrade Search and Indexing latencies. Any resolution to latency requires Search backend resource scaling, thus increasing infrastructure and licensing cost.

Context & Scope

As we have looked over solutions to issues reported in earlier section, and found there are only two choices:

We cannot reliably scroll over large result set so drop the support of scrolling over records higher then 10K.
Provide a new Search service API that utilizes search_after API from Search backend (Elasticsearch).

We cannot limit maximum record that can be fetched from Search service as it may break existing consumer workflows. Search service must provide provide a reliable and performant API that will allow scrolling over all records in response, irrespective of their count.

search_after API does not suffer from the reliability issues that users have reported and recommended by Elasticsearch to be used in place of cursor/scroll API. Search service should add new API that makes use of search_after API from Elasticsearch.

Back to TOC

Proposed solution

Search service should two new endpoints to support pagination:

New endpoint to paginate via search_after API from Elasticsearch.
New endpoint to free up pagination resources if next page is not needed.

API specification

openapi: 3.0.0
info:
  description: Search service
  version: 2.0.0
  title: Search Service APIs
tags:
  - name: Search
    description: Service endpoints to search data in OSDU Data Platform
security:
  - bearer: []
paths:
  /pagination-query:
    post:
      tags:
        - Search
      summary: Queries using the input request criteria.
      description: "The API supports full text search on string fields, range queries on date, numeric or string fields, along with geo-spatial search. Required
        roles: 'users.datalake.viewers' or 'users.datalake.editors' or 'users.datalake.admins'. In addition, users must be a member of data
        groups to access the data. It can be used to retrieve large numbers of results (or even all results) from a single search request, in much the
        same way as you would use a cursor on a traditional database. API will respond with `nextCursor` if results are higher then maximum page size (1K). To request
        next page, another request with same API that includes `nextCursor` value from last response must be supplied. All other fields on next pagination-query 
        request must be same and should be received by the service before cursor expires (defaults to 60s).
      operationId: Pagination query
      parameters:
        - $ref: "#/components/parameters/data-partition-id"
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/PaginationQueryRequest"
      responses:
        "200":
          description: Success
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/PaginationQueryResponse"
        "400":
          description: Invalid parameters were given on request
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "401":
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "403":
          description: User not authorized to perform the action
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "502":
          description: Search service scale-up is taking longer than expected. Wait 10
            seconds and retry.
          content:
            application/json:
              schema:
                type: string
      security:
        - bearer: []
  /pagination-query-cursor:
    delete:
      tags:
        - Search
      summary: Pagination resources should be freed up if not used anymore. Deletes pagination query cursor and frees up resources. 
      description: "Required roles: 'users.datalake.viewers' or 'users.datalake.editors' or 'users.datalake.admins'."
      operationId: Delete pagination query cursor
      parameters:
        - $ref: "#/components/parameters/data-partition-id"
      requestBody:
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/PaginationQueryCursorDeleteRequest"
      responses:
        "200":
          description: Success
        "400":
          description: Invalid parameters were given on request
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "401":
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "403":
          description: User not authorized to perform the action
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "404":
          description: Pagination query cursor not found
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/AppError"
        "502":
          description: Search service scale-up is taking longer than expected. Wait 10
            seconds and retry.
          content:
            application/json:
              schema:
                type: string
      security:
        - bearer: []
components:
  parameters:
    data-partition-id:
      name: data-partition-id
      in: header
      description: desired data partition id
      required: true
      schema:
        type: string
  securitySchemes:
    bearer:
      type: apiKey
      name: Authorization
      in: header
  schemas:
    PaginationQueryRequest:
      type: object
      required:
        - kind
      properties:
        kind:
          type: object
          example: The kind of the record to query e.g. "tenant1:test:well:1.0.0" or ["tenant1:test:well:1.0.0", "tenant1:test:well:2.0.0"].
          description: "'kind' to search"
        query:
          type: string
          description: The query string in Lucene query string syntax.
        returnedFields:
          type: array
          description: The fields on which to project the results.
          items:
            type: string
        sort:
          $ref: "#/components/schemas/SortQuery"
        queryAsOwner:
          type: boolean
          example: false
          description: The queryAsOwner switches between viewer and owner to return results
            that you are entitled to view or results you are the owner of.
        spatialFilter:
          $ref: "#/components/schemas/SpatialFilter"
        cursor:
          type: string
          description: Search context to retrieve next batch of results. It must be empty for the first request and subsequent requests must provide valid 'cursor'. 
        trackTotalCount:
          type: boolean
          description: Tracks accurate record count matching the query if 'true', partial count otherwise. Partial count queries are more performant. Default is 'false' and returns 10000 if matching records are higher than 10000.
      example:
        kind: osdu:welldb:wellbore:1.0.0
        limit: 30
        query: data.Basin:"Ft. Worth"
        returnedFields:
          - data.kind
        queryAsOwner: false
        cursor: <put a valid cursor or leave it blank for the first request>
    PaginationQueryResponse:
      type: object
      properties:
        nextCursor:
          type: string
          description: Search context to retrieve next batch of results. It's valid for 60s. Next pagination request must be recieved before it expires.
        results:
          type: array
          items:
            type: object
            additionalProperties:
              type: object
        totalCount:
          type: integer
          format: int64
          description: Returns accurate count if 'trackTotalCount' is 'true', partial count otherwise. Returns 10000 if matching records are higher than 10000 if partial count is requested.
    PaginationQueryCursorDeleteRequest:
      type: object
      properties:
        cursor:
          type: string
          description: Valid cursor for clean-up. Request must be received before cursor expiration.
    ByBoundingBox:
      type: object
      required:
        - bottomRight
        - topLeft
      properties:
        topLeft:
          $ref: "#/components/schemas/Point"
        bottomRight:
          $ref: "#/components/schemas/Point"
    ByDistance:
      type: object
      required:
        - point
      properties:
        distance:
          type: number
          format: double
          example: 1500
          description: The radius of the circle centered on the specified location. Points
            which fall into this circle are considered to be matches.
          minimum: 0
          maximum: 9223372036854776000
        point:
          $ref: "#/components/schemas/Point"
    ByGeoPolygon:
      type: object
      properties:
        points:
          type: array
          description: Polygon defined by a set of points.
          items:
            $ref: "#/components/schemas/Point"
    Point:
      type: object
      properties:
        latitude:
          type: number
          format: double
          example: 37.450727
          description: Latitude of point.
          minimum: -90
          maximum: 90
        longitude:
          type: number
          format: double
          example: -122.174762
          description: Longitude of point.
          minimum: -180
          maximum: 180
    SortQuery:
      type: object
      properties:
        field:
          type: array
          description: The list of fields to sort the results.
          items:
            type: string
        order:
          type: array
          description: The list of orders to sort the results. The element must be either
            ASC or DESC.
          items:
            type: string
    SpatialFilter:
      type: object
      properties:
        field:
          type: string
          description: geo-point field in the index on which filtering will be performed.
            Use GET schema API to find which fields supports spatial search.
        byBoundingBox:
          $ref: "#/components/schemas/ByBoundingBox"
        byDistance:
          $ref: "#/components/schemas/ByDistance"
        byGeoPolygon:
          $ref: "#/components/schemas/ByGeoPolygon"
    AppError:
      type: object
      properties:
        code:
          type: integer
          format: int32
        reason:
          type: string
        message:
          type: string

Implementation details on Pagination Query API

First search_after API usage requires a PIT id to be created ahead of time and supplied on the search_after API call to Elasticsearch cluster. Pagination Query API should wrap both of these API calls in first pagination request.

If there are more than one page then search_after API call will respond with PIT id of next page and sort values along with results. PIT id and sort values are required to fetch next page. Pagination Query API response's nextCursor attribute should be set to value that's a combination of both. PIT id is pretty long, it can be shortened & cached using existing hashing function before returning response to end user. nextCursor attribute can then be set to: shortened(PID id) + base64.encode(sort value).

When Search receives next page request then pagination-query API will breakdown PID id and sort values by above mentioned mechanism and make next search_after call.

Back to TOC

Consequences

Existing query_with_cursor API (POST /api/search/v2/query_with_cursor) should be deprecated.
New Pagination Query API using search_after API on Elasticsearch should be introduced.
New Delete Pagination Query Cursor API should be implemented.
Search service tutorial should be updated with:
- New APIs documentation
- Introduction of a 'Best Practices' section with following suggestions:
  - Migrate users from query_with_cursor API to new pagination-query API
  - Remind users to call DELETE /api/search/v2/pagination-query-cursor API to avoid overloading system if cursor is no longer in use or next page is not needed.

Back to TOC

Edited Aug 21, 2024 by Zhibin Mai