Words defined within an OSDU Context
The Open Subsurface Data Universe (OSDU) is a transformational, set of computing, data and networking capabilities that enables a step-change for energy companies to conduct the business processes of exploration, development, drilling, production, and optimization. OSDU supports advanced data handling and AI/ML support
The OSDU Data Platform provides:
- Universal, typed data repository for all relevant data, including safety through encryption, continuity through versioning and lineage relationships, integrity through quality assertions, and coherence as the single data distribution source.
- Data Loading, including loading data content that results from work activities, characterizing data content with disciplined metadata, integrity through data immutability, assessing the data content's fitness for purpose for various uses, and adding value through the derivation of related data items, and intelligent application of languages, dialects and taxonomies.
- Data Search and Discovery, including contextual search with immediate data discovery, qualification, and preview; intelligent value-added deep criteria, such as sentiment analysis; and multiple forms of search criteria, including geospatial search, full-text search, and faceted search.
- Data Delivery, including security through entitlement checking, access efficiency through standard and high-performance means of access, highly usable, configurable convenience functionality with complex transformations, extensibility through support for externally managed data and data virtualization.
- Business and technology agnostic basic framework and functionality enhanced by configurable business data context, internal data platform workflows, and event-driven data platform capabilities.
A collective name for information items created and managed as OSDU operates. OSDU Resources are typed at two levels: Group Type and Individual Type (within Group Type). Prominent OSDU Resource Group Types are: master-data, reference-data, work-product, work-product-component, and [logical] file.
Mapping to OpenDES words and concepts An OSDU Resource corresponds to an OpenDES Record across all Group Types of OSDU. Many OpenDES Records (representing ENTITIES by KIND) corresponds to OSDU Resources of type work-product-component. For these, the OSDU Resource and OpenDES Record represents the metadata (catalogue information) for an instance of usable business data, which is known is OSDU as the 'data content'.
Some OpenDES Records associate with the OSDU Resource Types: master-data and reference-data.
There is no existing corresponding behavior for OpenDES that corresponds to OSDU work-product Group TYpe.
OSDU Group Type / Individual Type correspond to the subset of OpenDES KINDs that represent Well-Known Entities and their types. See notes about OpenDES KIND below. OSDU types will be extended to account for non-well-known OpenDES KINDs to address source/target, and schema authority specific types and consumption specific types. [Open Question: Does the expanded type system include the specification of the overall format of the content (where applicable) or is this separate from schema authority. Assumption is that schema authority addresses both.]
An OSDU formal identifier ("Subsurface Data Universe Resource Name") for an instance of OSDU data. SRN has the form: :::. [Note: is defined in architecture, but has not yet been implemented.]
An SRN can be used to identify OSDU Resources as follows:
- Namespace identifies an OSDU implementation instance.
- Type identifies an OSDU Type consisting of a Group Type and Individual Type, e.g. master-data/Well
- Unique key
Namespace, Type and Unique Key identify an OSDU Resource Object. Adding the Version identifies an OSDU Resource Object Version.
Mapping to OpenDES words and concepts OSDU does not have a concept of multi-tenent operation, however, the namespace could be expanded to specify implementation instance and optionally tenant data partition.
An SRN can be used to identify OSDU Resource Types as:
- Namespace identifies an OSDU implementation instance.
- Type is the value
- Unique key identifies an OSDU Type consisting of a Group Type and Individual Type, e.g. master-data/Well
Namespace, Type and Unique Key identify an OSDU Resource Type. Adding the Version identifies an OSDU Resource Type Version.
A transfer of data to OSDU containing one or more items of data each known as a Work Product Component.
Mapping to OpenDES words and concepts There is no corresponding formalization of this concept in OpenDES. Implicitly, each ENTITY (data instance) transferred to OpenDES corresponds to a one-component work product.
WORK PRODUCT COMPONENT
A typed, smallest independently usable unit of business data content transferred to OSDU as part of a Work Product. Each Work Product Component consists of one or more data content units known as OSDU Files.
Mapping to OpenDES words and concepts An OSDU Work Product Component correlates with an OpenDES Record / ENTITY. Special consideration must be given to ENTITIES of KINDs that correlate to master data and reference data in terms of data platform behavior.
WORK PRODUCT COMPONENT Type
A classification of an OSDU Work Product Component. The Work Product Component Type is defined in a JSON Property named ResourceType that contains an element for the Group Type ("work-product-component") and Individual Type. Examples include: WellLog with an SRN of the form: srn::type: work-product-component/WellLog:. Each Work Product Component Type (version) has a defined metadata model expressed as a JSON schema.
Mapping to OpenDES words and concepts In OpenDES, the classification of a Record (ENTITY) is defined in a JSON Property named KIND. Work Product Component Type corresponds to ENTITY KIND. As stated above, this correspondence applies to Well Known Entities/Schemas. With the extensions mentioned above to identify source/target/etc. and schema authority, there can be full correspondence with ENTITY KIND. This applies for this Group Type and other OSDU Group Types.
The first element of the OSDU Resource Type that specifies the implementation instance data scope. This was not implemented in OSDU R0 Demo or R1. The namespace distinguishes data resident in a single instance of OSDU, e.g. by an energy company or for a specific joint venture.
Mapping to OpenDES words and concepts OpenDES has a first element of the OpenDES Record property 'id' called 'namespace' that correlates with OSDU's 'namespace'. Note: There is a need to disambiguate between two meanings of 'namespace'. One meaning is the multi-tenant data partition identification. The other meaning is the implementation instance designation as used for OSDU. The approach to accomplish this is to make namespace a two-part identification: implementation instance and optionally a tenant (data partition) identification.
DATA COLLECTION (or COLLECTION)
A set of OSDU Resource items, each identified by an OSDU SRN. Data Collections can be registered (stored) and shared for collaboration by a client through the OSDU Collection API.
Mapping to OpenDES words and concepts The relationship between OSDU Data Collection and OpenDES Collection has not yet been determined.
Subsurface Master Data Store (or Services). A historical name for the OSDU data platform capabilities related to master data and reference data resource types.
Subsurface Work Product Store (or Services). A historical name for the OSDU data platform capabilities related to work product, work product component, and file resource types.
Words defined by OpenDES
A service that internally persists data of a specific domain. All data is exposed through service APIs. It is governed by Data Ecosystem but not necessary owned by it. It uses Data Ecosystem services to enforce contracts on data entry and use.
The DDMS has recently been decomposed into:
DMS (Data Management Service)
- Generic Data Infrastructure for a specific shape of data
- Examples: File DMS, Object DMS, Tabular DMS, Timeseries DMS
- DOMS (Domain Object Management Service)### DATA DOMAIN One of a small number of business subject areas that partition data (classified by the 'type' in KIND) for data object handling. Example: seismic related data (surveys, trace data, faults, etc.), wellbore related data (well log, trajectory, etc.).
Mapping to OSDU words and concepts The is no formal OSDU concept that corresponds to OpenDES Data Domain. However, OSDU architecture does provide for categories of data (sets of types) to be supported by specifically chosen storage technologies -- similar to the OpenDES Data Management Service.
DATA DOMAIN OBJECT [or DOMAIN OBJECT]
A reference to instances of OpenDES data Records (ENTITIES) in the context of a Data Domain.
DATA DOMAIN TYPE [or DOMAIN TYPE]
A classification of data domain objects, expressed as the 'type' element of the KIND property in the associated Record.
A service that internally persists data of a specific domain. All data is exposed through service APIs. It is governed by OpenDES, but not necessary owned by it. It uses OpenDIS services to enforce contracts on data entry and use.
The DDMS has recently been decomposed into:
DMS (Data Management Service)
- Generic Data Infrastructure for a specific shape of data
- Examples: File DMS, Object DMS, Tabular DMS, Timeseries DMS
DOMS (Domain Object Management Service)
- They have APIs that expose domain objects [expose?]
- Expose setters and getters in standard exchange schema [relate WKS with this point]
- They can expose additional protocols/schemas for domain objects [Elaborate on protocols]
- Example: Wellbore DOMS, Seismic DOMS
Open DES is a vision, organization, and a technology.
VISION: Provide decentralized, loosely coupled collection of enterprise data via domain optimized data solutions, enabling cross-domain workflows implemented in a way that enables enterprise and DELFI to scale, where architectural principles are systematically enforced.
ORGANIZATION: It is a team delivering solutions for cross-cutting data concerns, governing and enforcing data architectural principles.
TECHNOLOGY: Is an integrated collection of Domain Data Management Services, Core Services, Mandatory, and Helper Services.
An instance of data or metadata represented as a KIND. An ENTITY is identified by an "id" represented as
- Namespace defines the data population in which the ENTITY exists. Note: Two meanings are used and may have to be distinguished: implementation instance and tenant within implementation instance.
- Source. Note: The second element usage is to-be-defined. (Some usage copies the value from the KIND's source.)
- Unique Identifier. (Some usage prefixes the unique identifier value with a copy of the 'type' of KIND.)
NOTE: Uniqueness is given by the "id" and "version" (where version is a granular, monotonically increasing time value)
Mapping to OSDU words and concepts Note: An OSDU Resource corresponds with an ENTITY, and a ResourceType corresponds with a KIND. See comments above about Well Known Entity/Schema correlation with existing OSDU and extensions to the OSDU type system to fully accommodate the correspondence with ENTITY type by KIND.
A Kind of Data which is represented in OpenDES as
- Namespace defines the schema authority as different companies often have specialized implementation of a data/metadata model (e.g. Shell, Chevron, SLB)
- Source is the source where the data comes from - this often expresses a reference data model (format), e.g. WITSML, LAS, Petrel, ... .
- Type is the type of ENTITY (Well, Log, Seismic 3D Survey, ...)
- Version is the version of the KIND
Note: Namespace in KIND has different usage from namespace in Record 'id'. This element name will be changed to remove the usage overload. Candidate name = schema authority. The name of the Source element may be changed to clarify usage as to the format [authority]. Candidate name = format authority. Nominal values in the OpenDES Well Known world should reflect the same values as for the id's namespace.
A data/metadata model expressed as JSON for a type of data, i.e. a KIND. The phrase 'standardized schema' is used to emphasize that all instances of a given type of data content associate with (conform with) the same schema. In other words, data content can be read and understood based on the associated schema, including to understand and/or process data for loading, ingestion, indexing, as well as direct search/discovery and delivery/consumption.
Mapping to OSDU words and concepts OSDU uses also uses the word 'schema' for a data/metadata model's JSON representation. Theses are used for master-data and reference-data data models as well as for metadata models for work-products, work-product-components, and metadata files. For OSDU, schemas are standardized is the same sense as for OpenDES.
A standardized schema associated with a 'type' of the data, i.e. KIND [within a data domain]. The schema will be most accommodative for that data domain type. It is a schema that is used to understand and/or process data in particular for discovery, exchange [please define] and default consumption.
Mapping to OSDU words and concepts OSDU also provides 'standardized schemas' associated with every 'type' of data in the categories of the OSDU Group Types. The most prevalent case is for OpenDES KINDS corresponding with OSDU types of work-product-components. In OSDU, these schemas accommodate the associated data types and ae used for data platform functions from loading, ingestion, indexing through to search/discovery and delivery/consumption. All existing OSDU schemas correspond with Well Known Schemas in support of Well Known Entities.
For OSDU, merging / blending from origin data types (with schemas) is expected to be accomplished prior to data coming to OSDU. OSDU can be configured to define multiple origin types (with schemas) so that these may be loaded to OSDU and so that the merging / blending to WKE instances (with a WKS schema) can be accomplished. This can be enabled in the future with the type extensions described above.
An entity which best represents that entity type. E.g., A Wellbore that is created by merging the most correct information from various sources available for that Wellbore. A log that is tagged as the best log by a SME. WKE has the same schema as the WKS for that domain data type.
Mapping to OSDU words and concepts See entry above. Tagging of best logs by an SME is accomplished in OSDU with a property value that is found in a master-data entity, such as the Wellbore master-data property PreferredLog with a value that references the best log. There is no tagging directly on the log (work-product-component) entity.
Externally Defined Words
API or Application Programming Interface is the specification of how a programmer writing an application accesses the behavior and state of services, classes and objects. (2) A set of calling conventions defining how a service is invoked through a software package
Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.
Big data has many definitions. Common to all these is a paradigm shift in how we capture and process information.
- In the past, we would capture and save some of the data – with big data we capture it all.
- In the past, we would structure the data to fit pre-conceived notions of utility – with big data we accept heterogeneity and transform for consumption on the fly.
- In the past, we would analyze data to test pre-existing ideas – with big data we focus on discovery of new ideas and patterns – with the intention of utilizing these patterns later in real-time.
The CAP theorem states that there are three desirable system requirements for the successful design, implementation and deployment of applications in distributed computing systems.
Attaining all three is not however possible. The three are:
- Consistency refers to predictability and reliability of data in a database across all nodes. This is the same idea of consistency described in ACID.
- Availability means the given system is available when needed.
- Partition Tolerance refers to whether a given system continues to operate even when there is partial data loss or temporary system failure or interruption. Ideally a single node failure will not cause the entire system to stop functioning.
A Composite Application is a software application built by combining multiple existing services into a new application. In the world of micro-services, this requires identifying a configuration of services which might reside in multiple containers.
Software Containers are lightweight runtime environments with many of the core components of a virtual machine and isolated services of an operating system. They are designed to make packaging simple and the execution of software & microservices simple and scalable.
Software systems that manage the lifecycle of containers and microservices across a distributed system.
Continuous integration (CI) systems provide automation of the software build and validation process driven in a continuous way by running a configured sequence of operations every time a software change is checked into the source code management repository. These are closely associated with agile development practices and closely related to the emerging DevOps toolsets.
A service that exposes data from multiple sources as coming from a single logical repository.
A service that unifies aggregated data such that logically identical entities from multiple sources are exposed as a single entity.
Data mastering is the process by which an un-mastered data source record is linked or merged with another master data record. The data mastering process can either result in the creation of a new master data record, or the source data record is linked to an existing master data record.
All employees are Data Stewards in relation to the data they use, produce, or define within the context of their existing job responsibilities.
DevOps represents a change in IT culture, focusing on rapid IT service delivery through the adoption of agile, lean practices in the context of a system-oriented approach. DevOps emphasizes people (and culture), and seeks to improve collaboration between operations and development teams. DevOps implementations utilize technology — especially automation tools that can leverage an increasingly programmable and dynamic infrastructure from a life cycle perspective.
Docker is an open-source project which automates the deployment of applications inside portable containers that are independent of hardware, host operating system, and language.
A Document Store is a document-oriented database is designed for storing, retrieving, and managing semi-structured data.
Domain APIs defines the data types (domain objects) and services available relating to a particular domain. There will be several Domain API's and a product may choose to expose more than one.
A service that models a specific domain and implements its business logic. Usually leverages one or more domain data management services. The main difference to domain data management service is that it focuses on rich and elaborate consumption patterns as opposed to domain specific storage.
The Event Sourcing pattern defines an approach to handling operations on data that is driven by a sequence of events, each of which is recorded in an append-only store. Application code sends a series of events that imperatively describe each action that has occurred on the data to the event store, where they are persisted. Each event represents a set of changes to the data. The events are persisted in an event store that acts as the source of truth or system of record (the authoritative data source for a given data element or piece of information) about the current state of the data. The event store typically publishes these events so that consumers can be notified and can handle them if needed. Consumers could, for example, initiate tasks that apply the operations in the events to other systems, or perform any other associated action that is required to complete the operation. (more at http://martinfowler.com/eaaDev/EventSourcing.html)
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.  Eventual consistency is widely deployed in distributed systems, often under the moniker of optimistic replication, and has origins in early mobile computing projects. A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence.
A framework is a structure of assumptions, values, principles, and rules that holds together the ideas comprising a broad concept. In some cases, a framework may be supported by implementation to facilitate adherence to these values, principles and rules; however, the enduring element of the framework are the ideas.
A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships.
HYPOTHESIS DRIVEN DEVELOPMENT
Hypothesis-Driven Development (HDD) is thinking about the development of new ideas, products and services as a series of experiments to determine whether an expected outcome will be achieved. This practice is iterated upon until a desirable outcome is obtained or the idea is determined to not be viable.
Master Data is a single source of basic business data used across multiple systems, applications, and/or processes. Data mastering is the process by which an unmastered data source record is linked or merged with another master data record.
MINIMUM VIABLE PRODUCT
A minimum viable product (MVP) is the most pared down version of a product that can still be released. An MVP has three key characteristics:
- It has enough value that people are willing to use it or buy it initially
- It demonstrates enough future benefit to retain early adopters
- It provides a feedback loop to guide future development
For Hypothesis Driven Development, it is the minimum product development to prove (or disprove) a hypothesis
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Operational data is data that is tied to the specific problem being addressed in the Platform or solution. It’s schema, lifecycle, format and footprint match and evolves with the needs of the platform.
Reference Data is the set of permissible values to be used by other (master or transaction) data fields. Reference data normally changes slowly, reflecting changes in the modes of operation of the business, rather than changing in the normal course of business
Schema on read refers to an innovative data analysis strategy in new data-handling tools like Hadoop and other more involved database technologies. In schema on read, data is applied to a plan or schema as it is pulled out of a stored location, rather than as it goes in.
Schema on write is a traditional technique for database storage that has, in some ways, given way to newer ideas applied to more sophisticated systems. Schema on write is often contrasted with schema on read, which is a newer data handling method that gives businesses and other parties more flexibility in using big data and analytics systems.
Stream processing is the in-memory analysis of data in motion to extract actionable intelligence to react to operational exceptions through real-time alerts and automated actions in order to correct or avert a problem.
SYSTEM OF RECORD
A system of record is an information storage system that is the authoritative data source for a given data element or piece of information
SYSTEM OF REFERENCE
A System of Reference may or may not be the System of Record for a transaction, but it is an Operational Data Store or Data Warehouse where other systems are able to get access to that transaction sometime after it has been created.
Tiered storage is the assignment of different categories of data to different types of storage media in order to reduce total storage cost. Categories may be based on levels of protection needed, performance requirements, frequency of use, and other considerations
Beyer, B., Petoff, J., Jones, C., & Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems.
Humble, J., & Molesky, J. (2015). Lean Enterprise: How High Performance Organizations Innovate at Scale. Sebastopol: O'Reilly Media, Inc.
Initiative, O. A. (2017). Retrieved from Open API Initiative: https://www.openapis.org/
Newman, S. (2015). Building Microservices. Sebastopol, Ca: O'Reilly Media, Inc.
OWASP. (2017). OWASP SAMM Project. Retrieved from OWASP Software Assurance Maturity Model: https://www.owasp.org/index.php/OWASP_SAMM_Project