Problem - 1
- OpenDES supports schema-on-read pattern allowing data from sources to be ingested in their original format
- OpenDES Data needs a way to discover data in OpenDES in a unified manner independent of above consumption models or source models
Solution - WKS
- Translate the data to a well known structure (WKS), so a user can search for example "Wells operated by Chevron in Eagleford" without worrying about whether the Well came from IHS or ProSource or Petrel and what the source structures looked like for the Well including the operator and the basin attribute needed for the query here.
- Note: WKS is the schema definition, but what we are actually creating here are WKS structured data instances. It was once referred to as Well Known Instances (WKI) but over time the WKS has come to signify both the well known structure and ingested data that has been enriched to this structure. Each DOMS will provide APIs to GET/POST domain objects in the WKS structure/schema. Additionally, they can provide APIs to GET/POST domain objects in the domain native structure/schema. Hope that clears any confusion!
- An enrichment pipeline is used to perform a set of transformations to the source data to bring it to the consumable schema for discovery ala WKS. This can include, but not limited to,
- taking multiple source entities and merging them,
- attribute renames to standard schema,
- adding missing attributes thru calculations, classification etc.,
- translating values to standard dictionary of values as in well status, field type, completion type etc.
- Note: The normalization API is called when we create a cache in Elastic, MapLarge, and Spotfire (only Spotfire being done today, others ASAP). This avoids multiple transformations which leads to data integrity issues. (Discussed and agreed with Thomas)
- performing frame-of-reference conversions to homogenize them for queries
- chasing related entities from this or other sources and providing a relationship navigation
- Here is a very simple example where two sources have wells W1 and W11 respectively with different attributes and this needs to be translated to well known schemas mapping the attributes to the WKS nomenclature.
Problem - 2
- OpenDES would end up multiple instances of data from the same source or from other sources and users need a reliable version of the data curated by OpenDES for use in workflows
Solution - WKE or precisely Discovery WKE
- Apply different algorithms to determine how a final instance can be created that complies to the WKS schema. This definitive instance of the item that has been curated in the OpenDES enrichement pipeline is called Well Known Entity or WKE
- Identity therefore becomes key to identify the duplicates and to merge them effectively. This is is also required to associate children from one or more source effectively as shown below.
- Here is an example where two sources have the same well have been merged to create a definitive instance WKE below. This shows two options below for the final WKE based on what the source priority is the left with source 1 taking priority vs the right where source 2 takes higher priority
- As you can see the enrichment pipeline therefore performs a variety of steps as shown below to convert source entity structures to normalized WKS structure based instances and then on to additional conversions including frame-of-reference and then curating multiple instances to one at source level and then resorting to cross-source match/merge with source priority to create both the definitive instance ala WKE and also hang the right relationships across sources to the definitive entity as shown.
- Note: Merging is only applicable for duplicates from multiple sources and it does require transformation to a normalized unit and CRS system (SI and WGS84). For interpretation entities we need to use the preferred flag defined in Thomas’s schema to attach ONLY the preferred WKS entities to the WKE.
Enrichment Services All of these enrichment steps are done by independent components that listen to events coming out from storage, so the system overall is loosely coupled and follows an event-based choreography technique.
- Here is an example of how survey data from different sources can be used to generate curated trustable WKE deviation surveys that can be used in consumption workflows.
- Here is another perspective of how these enrichment steps are being invoked by events
- As a further illustration of this loosely coupled enrichment services model, you can see how other data flows can be achieved by plugging in the right enrichment services to pipeline
- These enrichment services do not neccesarily have to operate on structured data and typically match, merge, group, transpose, filter type operations, it could operate on unstructured and semi-structured data performing suitable operations such as OCR, classification, fact extraction to derive value-enriched data and feedback these to the data ecosystem. Here is an example shown below.