ADR: Precise/exact match query
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context
It's not possible to search for records expecting only exact/precise match on fields with special characters using current Search service request query parameter.
Consider a following scenario:
Users ingested records with following WellId
via Storage service:
- "MS WELLBORE 250321"
- "MS=[WELLBORE]{250321}~?123"
- "MS^&WELLBORE(+250321:|123"
User performs exact match query via Search service:
{
"query": "data.WellID:\"MS WELLBORE 250321\""
}
Search service will respond with results matching all three records.
Indexer service analyzes text input when it indexes record. It uses Elasticsearch's standard analyzer for text processing. This analyzer tokenize the input & removes most of the punctuations and special characters before indexing. Similar analysis is also done when a user queries text fields/attributes via Search service query endpoints.
Going back to above example, during indexing all special characters will be removed and when user search for WellID
MS WELLBORE 250321
; all three records will be matched as all of the them have 'MS','WELLBORE','250321' tokens.
Proposed Solution
As of OSDU Mercury release, Indexer service adds a non-analyzed field (.keyword
) for each text field to enable aggregations. Incoming text input is indexed as-is and no special characters are dropped. This field can be utilized for exact/precise match use-case.
A new query syntax is required for performing exact match query so non-analyzed fields can be targeted.
We can extend the syntax of nested()
query and introduce similar function for exact match:
{
"query":"exact(data.WellID, \"MS WELLBORE 250321\")"
}
This will only return records matching MS WELLBORE 250321
WellID
only using non-analyzed fields.
Same non-analyzed text fields re also used to capture null
values. This syntax can also be utilized to query records with null
valued attributes:
{
"query":"exact(data.WellID, \"null\")"
}
Decision
Rationale
It's very common to have special characters in O&G domain entities'/object's name. e.g. North Sea Wellbore names ( e.g. 25/8‑20 C, 25/8‑21 S). Search service must be enhanced to support query matching precise/exact values.
Tradeoff Analysis - Input to decision
- Partial matches in these scenario makes searching records/dataset very time consuming as they users have to scroll through lot of results.