Skip to content
Snippets Groups Projects
IndexerService.md 12.7 KiB
Newer Older
  • Learn to ignore specific revisions
  • ## Indexer service
    
    ### Table of contents <a name="TOC"></a>
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    - [Indexer service](#indexer-service)
    
    - [Introduction](#introduction)
    
    Zhibin Mai's avatar
    Zhibin Mai committed
    - [Features](#features)
      - [Geoshape Decimation](#geoshape-decimation)
    
    - [Indexer API access](#indexer-api-access)
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    - [API Reference](#api-reference)
      - [Version info endpoint](#version-info-endpoint)
      - [Reindex](#reindex)
      - [Data Partition provision](#data-partition-provision)
      - [Schema change](#schema-change)
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    - [Troubleshoot Indexing Issues](#troubleshoot-indexing-issues)  
    
      - [Get indexing status](#get-indexing-status)
    
    ## Introduction <a name="introduction"></a>
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    The Indexer API provides a mechanism for indexing documents that contain structured or unstructured data. Documents and
    indices are saved in a separate persistent store optimized for search operations. The indexer API can index any number
    of documents.
    
    The indexer is indexes attributes defined in the schema. Schema can be created at the time of record ingestion in OSDU Data Platform
    via Schema Service. The Indexer service also adds number of OSDU Data Platform meta attributes such as id, kind,
    parent, acl, namespace, type, version, legaltags, index to each record at the time of indexing.
    
    Zhibin Mai's avatar
    Zhibin Mai committed
    ## Features <a name="features"></a>
    
    ### Geoshape Decimation <a name="geoshape-decimation"></a>
    
    In order to improve indexing and search performance for documents with large geometry, the geo-shape of the following
    GeoJSON types in the original shape attribute and virtual shape attribute if exists are decimated
    by implementing Ramer–Douglas–Peucker algorithm:
    - LineString
    - MultiLineString
    - Polygon
    - MultiPolygon
    
    The feature is enabled for all data partitions since M19.
    
    
    ## Indexer API access <a name="indexer-api-access"></a>
    
    - Required roles
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
      Indexer service requires that users (and service accounts) have dedicated roles in order to use it. Users must be a member of `users.datalake.viewers` or `users.datalake.editors` or `users.datalake.admins`, roles can be assigned using the [Entitlements Service](/solutions/osdu/tutorials/core-services/entitlementsservice). Please look at the API documentation for specific requirements.
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      In addition to service roles, users __must__ be a member of data groups to access the data.
    
    - Required headers
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      The OSDU Data Platform stores data in different partitions, depending on the different accounts in the OSDU system.
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      A user may belong to more than one account.  As a user, after logging into the OSDU portal, you need to select the account you wish to be active.
      Likewise, when using the Search APIs, you need to specify the active account in the header called `data-partition-id`. The correct `data-partition-id` can be obtained from the CFS services. The `data-partition-id` enables the search within the mapped partition. e.g.
    
      ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      data-partition-id: opendes
    
    - Optional headers
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      The `correlation-id` is a traceable ID to track the journey of a single request. The `correlation-id` can be a GUID on the header with a key. It is best practice to provide the correlation-id so the request can be tracked through all the services.
    
      ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      correlation-id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70
    
      ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    If the service is initiating the request, an ID should be generated. If the `correlation-id` is not provided, then a new ID will be generated by the service so that the request would be traceable.
    
    
    [Back to table of contents](#TOC)
    
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ## API Reference
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ### Version info endpoint
    
    Provides build and git related information.
    
    #### Request
    
    ```http
    GET /api/indexer/v2/info HTTP/1.1
    ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    #### Example response
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      "groupId": "org.opengroup.osdu",
    
      "artifactId": "indexer-gc",
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      "version": "0.10.0-SNAPSHOT",
      "buildTime": "2021-07-09T14:29:51.584Z",
      "branch": "feature/GONRG-2681_Build_info",
      "commitId": "7777",
      "commitMessage": "Added copyright to version info properties file",
      "connectedOuterServices": [
        {
          "name": "elasticSearch",
          "version": "..."
        },
        {
          "name": "redis",
          "version": "..."
        }
      ]
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    This endpoint takes information from files, generated by `spring-boot-maven-plugin`, `git-commit-id-plugin` plugins. Need to specify paths for generated files to matching properties:
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    - `version.info.buildPropertiesPath`
    - `version.info.gitPropertiesPath`
    
    
    [Back to table of contents](#TOC)
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ### Reindex <a name="reindex"></a>
    
    MZhu9's avatar
    MZhu9 committed
    #### Reindex a 'kind'
    
    Reindex kind API allows users to re-index a `kind` without re-ingesting the records via storage API. Reindexing a kind is an asynchronous operation and when a user calls this API, it will respond with HTTP 200 if it can launch the re-indexing or
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    appropriate error code if it cannot. The current status of the indexing can be tracked by calling search API and making query with this particular kind. Please be advised, it may take few seconds to few hours to finish the re-indexing as
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    multiple factors contribute to latency, such as number of records in the kind, current load at the indexer service etc.
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    #### Request
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ```http
    POST /api/indexer/v2/reindex HTTP/1.1
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      "kind": "opendes:welldb:wellbore:1.0.0"
    
    }
    ```
    
    <details><summary>**Curl**</summary>
    
    
    curl --request POST \
      --url '/api/indexer/v2/reindex' \
      --header 'accept: application/json' \
      --header 'authorization: Bearer <JWT>' \
      --header 'content-type: application/json' \
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      --header 'data-partition-id: opendes' \
    
      --data '{
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      "kind": "opendes:welldb:wellbore:1.0.0"
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    </details><br>
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    #### Prerequisite
    
    Users must be a member of `users.datalake.admins` or `users.datalake.ops` group.
    
    #### Query parameters
    
    `force_clean` <br />
    
    &emsp;&emsp;(optional, Boolean) If there is any inconsistency between the storage records and the index records, you can use this query parameter to synchronize them. If `true`, it will drop the current index data, apply latest schema changes & re-index records. If `false`, reindex API will apply the latest schema and overwrite records with the same ids. Default value is `false`.
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    
    #### Request body
    
    `kind` <br />
    
    &emsp;&emsp;(required, String) Kind to be re-indexed.
    
    MZhu9's avatar
    MZhu9 committed
    #### Reindex given records
    
    Reindex records API allows users to re-index the given records by providing the record ids without re-ingesting the records via storage API. Reindexing a kind is an asynchronous operation and when a user calls this API, it will respond with HTTP 202 if it can launch the re-indexing or
    appropriate error code if it cannot. The response body indicates which given records were re-indexed and which ones were not found in storage. It supports up to 1000 records per API call. 
    
    #### Request
    
    ```http
    POST /api/indexer/v2/reindex/records HTTP/1.1
    {
      "recordIds": ["opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d", "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"]
    }
    ```
    
    <details><summary>**Curl**</summary>
    
    ```bash
    curl --request POST \
      --url '/api/indexer/v2/reindex/records' \
      --header 'accept: application/json' \
      --header 'authorization: Bearer <JWT>' \
      --header 'content-type: application/json' \
      --header 'data-partition-id: opendes' \
      --data '{
      "recordIds": ["opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d", "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"]
    }'
    ```
    
    </details><br>
    
    #### Prerequisite
    
    Users must be a member of `users.datalake.admins` or `users.datalake.ops` group.
    
    #### Request body
    
    `recordIds` <br />
    &emsp;&emsp;(required, Array of String) Storage records to be re-indexed.
    
    #### Example response
    
    ```json
    {
      "reIndexedRecords": [
        "opendes:work-product-component--WellLog:566edebc-1a9f-4f4d-9a30-ed458e959ac7"
      ],
      "notFoundRecords": [
        "opendes:work-product-component--WellLog:17763fcc18864f4f8eab62e320f8913d"
      ]
    }
    ```
    
    
    [Back to table of contents](#TOC)
    
    ## Delete API <a name="delete"></a>
    
    Delete API is used to delete an index for a specific kind.
    
    Only users who belong to the Entitlement groups 'users.datalake.ops' can make calls to this API.
    
    ```
    DELETE /api/indexer/v2/index?kind=opendes:welldb:wellbore:1.0.0
    ```
    
    <details><summary>**Curl**</summary>
    
    ```bash
    curl --request DELETE \
      --url '/api/indexer/v2/index?kind=opendes:welldb:wellbore:1.0.0' \
      --header 'authorization: Bearer <JWT>' \
      --header 'content-type: application/json' \
      --header 'data-partition-id: opendes' 
    ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    </details><br>
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ### Data Partition provision <a name="data-partition-provision"></a>
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    Configures Search backend for a data partition.
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    PUT /api/indexer/v2/partitions/provision HTTP/1.1
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    ```
    
    <details><summary>**Curl**</summary>
    
    ```bash
    curl --request PUT \
      --url '/api/indexer/v2/partitions/provision' \
      --header 'accept: application/json' \
      --header 'authorization: Bearer <JWT>' \
      --header 'content-type: application/json' \
      --header 'data-partition-id: opendes''
    ```
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    </details><br>
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    #### Prerequisite
    
    Users must be a member of `users.datalake.ops` group.
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    > __NOTE__: API should be run at-least once at the data partition provisioning to configure required resources/settings.
    
    [Back to table of contents](#TOC)
    
    
    Smitha Manjunath's avatar
    Smitha Manjunath committed
    ### Schema change <a name="schema-change"></a>
    
    Schema change event listener endpoint.
    
    > __Note:__ This is internal API and shouldn't be exposed publicly.
    
    #### Request
    
    ```http
    POST /api/indexer/v2/_dps/task-handlers/schema-worker HTTP/1.1
    {
        "messageId": "676894654",
        "publishTime": "2017-03-19T00:00:00",
        "attributes": {
            "data-partition-id": "opendes",
            "correlation-id": "b5a281bd-f59d-4db2-9939-b2d85036fc7e"
        },
        "data": "[{\"kind\":\"slb:indexer:test-data--SchemaEventIntegration:1.0.0\",\"op\":\"create\"}]"
    }
    ```
    
    #### Request body
    
    `messageId` <br />
    &emsp;&emsp;(optional, String) Event message id.
    
    `publishTime` <br />
    &emsp;&emsp;(optional, String) Event publish time.
    
    `attributes.data-partition-id` <br />
    &emsp;&emsp;(required, String) Data partition id for which this message is targeted.
    
    `attributes.correlation-id` <br />
    &emsp;&emsp;(optional, String) Correlation-id to enable tracing.
    
    `data` <br />
    &emsp;&emsp;(required, String) Schema change event message json string. Only `create` and `update` events are supported.
    
    
    [Back to table of contents](#TOC)
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    # Troubleshoot Indexing Issues <a name="troubleshoot-indexing-issues"></a>
    
    ## Get indexing status <a name="get-indexing-status"></a>
    
    Indexer service adds internal metadata to each record which registers the status of the indexing. The meta data includes
    the status and the last indexing date and time. This additional meta block helps to see the details of indexing. The
    format of the index meta block is as follows:
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      "index": {
        "trace": [
          String,
          String
        ],
        "statusCode": Integer,
        "lastUpdateTime": Datetime
      }
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    Example:
    
    ```json
    {
      "results": [
        {
          "index": {
            "trace": [
              "datetime parsing error: unknown format for attribute: endDate | value: 9000-01-01T00:00:00.0000000",
              "datetime parsing error: unknown format for attribute: startDate | value: 1990-01-01T00:00:00.0000000"
            ],
            "statusCode": 400,
            "lastUpdateTime": "2018-11-16T01:44:08.687Z"
          }
        }
      ],
      "totalCount": 31895
    } 
    ```
    
    Details of the index block:
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    1. `trace`: This field collects all the issues related to the indexing and concatenates using '|'. This is a string field.
    
    2. `statusCode`: This field determines the category of the error. This is an integer field. It can have the following values:
       - 200 - All OK
       - 404 - Schema is missing in Schema service.
       - 400 - Some fields were not properly mapped with the schema defined, such as the schema defined as `int` for field, but the input record had an attribute value of `text` etc.
    
    3. `lastUpdateTime`: This field captures the last time the record was updated by the Indexer service. This is datetime field, so you can do range queries on this field.
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    You can query the index status using the following example query:
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    ```http
    POST /search/v2/query HTTP/1.1
    {
      "kind": "*:*:*:*",
      "query": "index.statusCode:404",
      "limit": 1000,
      "returnedFields": [ "id", "index" ]
    }
    ```
    
    <details><summary>**Curl**</summary>
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    ```bash
    curl --request POST \
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      --url /search/v2/query \
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      --header 'Authorization: Token' \
      --header 'Content-Type: application/json' \
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      --header 'Data-Partition-Id: opendes' \
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
      --data '{"kind": "*:*:*:*","query": "index.statusCode:404","returnedFields": ["index"]}'
    ```
    
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    </details><br>
    
    __Note__: By default, the API response excludes the `index` attribute block. You must specify `index` field in `returnedFields` in order to see it in the response.
    
    The above query returns all records which had problems due to fields mismatch.
    
    Neelesh Thakur's avatar
    Neelesh Thakur committed
    
    Please refer to the [Search service](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/api/search_openapi.yaml#L28) documentation for examples on different kinds of search
    queries.
    
    
    [Back to table of contents](#TOC)