Fields shouldn't be represented as hierarchical strings within a Search platform
Some fields within the Search Mappings are not search friendly and are counter intuitive for search from both a performance and usability/UX perspective.
Two examples come to mind: SRN & Kinds
Example: SRN ("data.Data.IndividualTypeProperties.TrajectoryTypeID": "srn:reference-data/WellboreTrajectoryType:Deviated:")
There are plenty of strong, valid arguments for having the SRN data format when describing a data schema such as the "Well Known Schema" in OSDU. However, when it comes to Search it adds a lot of confusion, complexity and redundancy. It feels as if the current process is to take the "Well Known Schema" and give that to the Search Service. It seems as if there is a missing layer that transforms the "Well Known Schema" into a "Search Schema".
Whenever considering the naming and structure of fields and value for Search, one should always ask "What value does this give the user?". To take that further, we can look at the example field name of "data.Data.IndividualTypeProperties.TrajectoryTypeID". Here we can see that 'data' is duplicated twice, one with the first letter uppercase and another with lowercase. We can then see a generic field name of "IndividualTypeProperties". Having data
duplicated twice would be confusing for the user, especially with the change in capitalisation. Does "IndividualTypeProperties" add value to the user? It doesn't provide any context to what child fields exist. Therefore, it adds a barrier to entry. An alternative way to represent this is shown below. Having the intermediate "IndividualTypeProperties" could be important, if so I would recommend a name change to make it more relevant.
"data.TrajectoryType": "Deviated"
or following a different example data."Data.IndividualTypeProperties.DataSourceOrganisationID" : "srn:master-data/Organisation:TNO:"
becomes:
"data.OrganisationID": "TNO"
The limitations of the long string srn format can be broken down into:
- Requires extensive wildcard searches which are computationally complex
- Relying on the incorrect field type for the kind of search that is being promoted - this should use Keyword, not Text
- It is complicated for end users to understand what the child values of generic field names are
- It is complicated for end users to understand the expected value for generic field names
- Can't rely on auto complete or fieldname discovery
- Creates a barrier of adoption for the developers building apps onto of OSDU as there is missing context due to generic field naming
Migrating to a more specific field name structure as suggested provides the following
- Simple and intuitive for both user and application driven search
- Can easily leverage autodiscovery & autocomplete of fields and values
- Can leverage both text and keyword data types for fields to get any desired behaviour
Example: Kind ("kind" : "opendes:osdu:wellbore-master:0.2.1")
As with the above SRN example, the searchability of Kinds today is quite limited. The kind field is a single keyword mapping that contains a host of information. Being a kind field, it is only available for exact term matching.
In order to enable a stronger search experience, I would recommend breaking kind" : "opendes:osdu:wellbore-master:0.2.1"
up to look similar to:
"kind": {
"raw": "opendes:osdu:wellbore-master:0.2.1",
"level1": "opendes",
"level2": "osdu",
"level3": "wellbore-master",
"version": {
"raw": "0.2.1",
"major": 0,
"minor": 2,
"patch": 1
}
}
By breaking this out, folks can do interact in a more intuitive way and have more flexibility and control. This concept isn't applicable to just kinds, however they are a good candidate.