Enable FullText Highlight Feature of Elastic Search

ADR: Enable request highlighting

Status

  • Proposed
  • Under review
  • Approved
  • Retired

Context & Scope

The Search service is built on top of the Elastic Search open source product. The searching features of that product used by the OSDU Search Service. This proposal provides for enabling additional such functionality to be made available.

Tradeoff Analysis - Input to decision

This functionality enhances the usefulness of the search service to consuming applications without requiring extensive development in the service itself.

Decision

We propose to extend the search query JSON domain-specific language by adding an optional field to the Query API input: highlightedFields and highlight. This will enable the ElasticSearch highlighting functionality (as documented here: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/highlighting.html).

highlightedFields will list the fields in which search term hits are highlighted in the results.

{
  "kind": "osdu:*:dataset--File.Generic:*",
    "query": "test*",
    "offset": 0,
    "limit": 30,
    "trackTotalCount": true,
    "highlightedFields": ["createUser", "id"]
}

When this input field is present, a new field is added to the Query response: highlight

{
  "results": [
    {
      "data": {
       ...
    },
    "kind": "osdu:wks:dataset--File.Generic:1.0.0",
    "source": "wks",
      ...
    "createUser": "serviceprincipal@testing.com",
    "id": "osdu:dataset--File.Generic:autotest8751235",
    "highlight": {
           "createUser": [ "serviceprincipal@<em>test</em>ing.com" ],
           "id": ["osdu:dataset--File.Generic:auto<em>test</em>8751235"]
    }
  },
  ]
}

In this case, the search term hits in the fields listed in "highlightedFields" and annotated with "em" tags for use in HTML-compatible display. If the user puts "highlight" in the input payload, then whatever ElasticSearch returns is passed back in "highlight".

Rationale

The field added enables a simple use case.

Consequences

There are no impacts to existing applications. The complexity of the search query input is increased very slightly. The performance of existing queries will not be affected.

When to revisit

Alternatives and implications

Decision criteria and tradeoffs

Decision timeline

Edited by Mark Chance