ADR: Search data across multiple kinds in one search
Status
-
Proposed -
Under review -
Approved -
Retired
Context & Scope
It is quite common for users or applications to search data across multiple kinds in one search. In OSDU search, each kind is mapped to one index. That means that users may need to search data across multi-indices in Elasticsearch. Elasticsearch supports search across multi-indices by specifying either index names as wildcard or a list of index names.
Currently, OSDU search only expose the wildcard solution (e.g. "kind": "*:*:*:*") to support search across multi-indices.
There may be hundreds of kinds if not thousands in one tenant data partition. We found that using wildcard to search across multi-indices introduces significant overhead on performance as comparing with a list of index names. The more indices in Elasticsearch, the bigger overhead could be introduced. The attached diagram shows our observation:
Trade-off Analysis
Here is the relevant API spec: https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/api/SearchApi.java
Without introducing new field in the search API, we propose to concatenate the index (kind) names with comma in the existing “kind” property, e.g.
e.g. I have kinds in the system
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
I want to search keyword "well" against only 2 kinds
a:b:c:d
a:e:c:d
today I can only do this by forming a query
{
“kind”: "a:*:c:d”,
“query”: “(\"kind\": \"a:b:c:d\" OR \"kind\": \"a:e:c:d\") AND well”
}
This still makes my query slower because the search is performed against all indexes the wildcard matches i.e.
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
even though I know I only want to search against 2 of the indexes. The proposed solution will allow me to change this to
{
“kind”: “a:b:c:d,a:e:c:d”,
"query": "well"
}
Making my query easier to write and potentially a lot more performant as it targets ony the indexes I want to search against
Here are the Pros and Cons of the proposal:
Pros | Cons |
---|---|
- Non-breaking change. No API change required. | - Not following the json pattern to code multiple items |
- It is consistent with Elasticsearch's pattern on coding multi-indices for search. | |
- Change only on "Common Code" in both "OSDU Core Common" and "Search Service". |
Decision
The proposal is a non-breaking change. Its implementation is pretty simple and safe. Prototype of the implementation in OSDU Core Common and Search Service can be found in MRs:
Consequences
This is a non-breaking change but with big performance gain when searching across multiple indices.