ADR: Common discovery within and across kinds
Status
-
Proposed -
Under review -
Approved -
Retired
Context & Scope
Today a single schema can define multiple properties for geospatial data. For example Wellbore schema defines both the GeographicBottomHoleLocation and ProjectedBottomHoleLocation properties.
The json key used for spatial data is also not consistent across schemas.
This causes issues for common consumption workflows like finding all entities that exist within a given area. This is because I don't know what property to query against for each type so to find all entities in a given area is complicated.
Looking beyond spatial data this is a common problem across different data types, for instance in a Wellbore schema the name is represented by the property 'FacilityName' however this key is not used for the name in other schemas.
We want to define a standard to allow indexing properties in a common way across types. This will provide
- A common property(s) to be searchable against across Kinds
- A priority list of schema properties that this can be populated from
- A way for these common properties to define relationships
Trade-off Analysis
We could declare a single property to use on each schema to use as the common property. However there are schemas where multiple properties could be used and instances of entities where a specific property is not defined and another one is. Therefore no single property will ever be correct.
We could re-use the property key defined in the schema for indexing. However This causes consumers problems as they have to understand what property to use for each schema when discovering/running analytics across kinds. Defining a common property between schemas that can be used by consumers solves this concern.
We could define the standard directly in the schema only. This follows existing patterns with the indexing hints used here. However this solution is inflexible to clients being able to provide their own mappings for OSDU schemas.
It does however allow for the standards to be maintained in the schema allowing control to be maintained by the schema authority. Therefore a solution that supports this whilst also providing flexibility to clients to provide their own mappings is preferable.
A separate ADR is proposed to allow for Schema extensions using the virtual property defined in this ADR.
Decision
We are proposing a new optional attribute in schemas to define a common property mapping.
For OSDU schemas we propose to introduce a new property x-osdu-virtual-properties
, with a dictionary of currently only one key DefaultLocation
. This lists the path to the property and the order defines the priority. The first item in the list has highest priority. If that property does not exist or is not populated, the next get precedent.
x-osdu-virtual-properties
can be used to map any properties to a new property name that can be used for consumption. Schemas can then declare the same virtual property to allow easier cross schema consumption.
The decision is backed by OSDU Data Definitions as per Core Concepts meeting July 6, 2021.
The virtual property declared is never added to the record however is made use of by consumption services like indexer/search to create an indexed entry for it and so make the data discoverable based on this property.
Example use case: Assigning virtual properties within a schema
{
"x-osdu-Virtual-properties":{
"data.VirtualProperties.DefaultLocation": {
"type": "object",
"priority": [
{ "path": "data.ProjectedBottomHoleLocation" },
{ "path": "data.GeographicBottomHoleLocation" },
{ "path": "data.SpatialLocation" }
]}
}
}
The above example is prepared for Wellbore, which comes with three potential shapes. The projected representation is preferred over the geographic coordinates. Last priority is the standard shape contributed by the AbstractFacility
.
For now we should restrict it so every key created through this must be prefixed with the following
data.VirtualProperties.
The DefaultLocation
key name does not clash with any existing entity type property. It becomes relevant in generic search queries across different types including spatial conditions, for example:
{
"kind": "*:*:*:*",
"spatialFilter": {
"field": "data.VirtualProperties.DefaultLocation",
"byGeoPolygon": {
"points": [
{"longitude":-90.65, "latitude":28.56},
{"longitude":-90.65, "latitude":35.56},
{"longitude":-85.65, "latitude":35.56},
{"longitude":-85.65, "latitude":28.56},
{"longitude":-90.65, "latitude":28.56}
]
}
}
There's also an optional isType
key you can apply to the priorities object. This restricts the selection based on the type of data the property points to which can be different per Record instance.
For example datasets and artifacts referenced by a record are generic schemas and so is dependent on the record instance. In the below example the data.dataset[].filepath
property is only mapped if it points to a GeoJson type ekse it then checks if it is a Raster file type. The isType
value is not restricted.
{
"x-osdu-virtual-properties":{
"data.VirtualProperties.MyLocation": {
"type": "object",
"priority": [
{
"path": "data.dataset[].filepath",
"isType": "GeoJson"
},
{
"path": "data.dataset[].filepath",
"isType": "Raster"
}
]}
}
}
The x-osdu-virtual-property
section also supports an optional x-osdu-relationship
block to describe a relationship this virtual property may have. See the example below.
The OSDU Data Definitions team ensures that canonical, well-known schemas contain a populated x-osdu-virtual-properties
.
The report will then look like:
Kind | Default Priority | Comment |
---|---|---|
→ osdu:wks:master-data--SeismicProcessingProject:1.0.0 | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
→ osdu:wks:master-data--Well:1.0.0 | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
→ osdu:wks:master-data--Wellbore:1.0.0 | 1: data.ProjectedBottomHoleLocation 2: data.GeographicBottomHoleLocation 3: data.SpatialLocation |
Schema Controlled Order |
The first two kinds are reported as undefined, the third reports a proper order definition via the schema.
Keeping the x-osdu-virtual-properties mapping within the schema allows the data definitions team in OSDU to maintain control and order of how properties are mapped. However we still need to allow flexibility for specific client consumption workflows. This will be provided by Schema extensions.
Example use case: Describing relationships with virtual properties
It is also possible to tag virtual properties as relationships to achieve specific processing/indexing of relationships. The tagging is performed exactly the same as on standard OSDU schemas using the x-osdu-
custom tags.
Here a simple relationship 'replication' example - the property PetrelProjectID
refers to a record id of a record kind slb:petrel:master-data--PetrelProject:*.*.*
. As a result, the property previously not visible to the indexer becomes declared and visible.
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.ExtensionProperties.PetrelProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
Unconstrained or open relationships to unspecified types are declared as "x-osdu-relationship": []
.
The next example demonstrates a new relationship by means of a virtual property with prioritized sources:
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.VirtualProperties.ApplicationProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.TechlogExtensions.TechlogProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"EntityType": "TechlogProject"
}
]
},
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
It demonstrates the 'virtual merge' of a relationship for a given record. The data.VirtualProperties.VirtualApplicationProjectID
is expected to carry a relationship to either a Petrel project (kind *:*:master-data--PetrelProject:*
) or a *:*:*TechlogProject:*
. Should the Wellbore record contain both property values as defined in the two path
values, the first one, the TechlogProjectID
is taken.
Consequences
- All existing OSDU schemas should be updated that define spatial data with a new
DefaultLocation
virtual property - Data Definitions team validates that all spatial entity types are properly tagged with
"x-osdu-virtual-properties"
. - Indexer needs to support
"x-osdu-virtual-properties"
- Indexer needs to re-index based on all schema creation/change notifications