SINGLE SOURCE OF TRUTH
OSDU is the System of Record
To provide enterprise visibility and management of data; and to break dependencies on source system, OSDU will become the System of Record for Upstream Data.
WE VALUE ALL DATA
MINIMIZE FRICTION ON DATA LOADING and INGESTION
A fundamental principle of the data ecosystem is to ease the friction of transferring data into OSDU. Thus, governance is focused on simply understanding what type of data we are dealing with and our right to use the data in the future.
DON'T FORGET WHAT YOU ALREADY KNOW
While the Principle above is intended to reduce the friction when bringing data into OSDU; all contextual information known about the data should be included at the time of loading in the form of appropriate general and type-specific metadata. It is difficult to discover this information in the future.
ALL RAW DATA IS PRESERVED
All raw data must be preserved. In case of loading, no specific content schema should be enforced. Because loading and retention minimize data loss, content schema enforcement is applied at the point of consumption either through translation on the fly or through a materialized schema-based transformation. For clarity, this applies to transactional types of data. Master data and reference data do conform to type-appropriate data schemas throughout their life cycles. Also, metadata for transactional types of data (work-product, work-product-component, file) have type-appropriate data schemas.
DATA ARE IMMUTABLE
By rule, data and metadata are immutable. Immutability should be achieved through versioning. Among other, immutability ensures data integrity, robustness, and can even ensure business continuity. Soft deletes, from consumption point of view, are allowed. Immutability of master-data, reference-data and transactional metadata starts at the completion of loading and ingestion steps. Immutability of transactional data content starts at the beginning of loading. Versioning of master-data and reference-data are accomplished by the loading of incremental next versions. Versioning of work-product, work-product-component and file metadata are accomplished by reprocessing requests based on changes to the loading, ingestion, and indexing behaviors. Versioning of transactional data content is accomplished by loading new instances that refer to predecessor instances via a lineage assertion metadata property value.
WE USE DATA SECURELY AND RESPONSIBLY
DATA ARE ACCESS CONTROLLED
Access to data is restricted and only authorized identities (users and services) can perform actions on data.
DATA ARE GOVERNED FOR RIGHT OF USE
Data compliance is enforced in all entry and exit points in our systems. Derived data preserves the compliance context [subject to the terms of the relevant data entitlement contract terms] of its parents or is established explicitly if data is derived from multiple parents with non-uniform policies.
DATA ARE OWNED AND MANAGED BY THE PRODUCING ORGANIZATION
Data persistence and management is decentralized. All data are owned and managed by the producing organization. Data persistence implementations (domain data management stores) are as well. [COMMENT: Please explain. Ownership of a data store is not clear.] They are governed by Data Ecosystem.
WE EMBRACE MINIMAL VIABLE GOVERNANCE
DATA ARE GLOBALLY IDENTIFIABLE
Data entities must be globally identifiable. This does not prevent entities that are an aggregation: in case there is no business value, and no loss of information, multiple data items can be grouped under one entity. [COMMENT: Clarify whether this covers the case where a composite (maybe better term than aggregation) will never have the requirement to expose any of the constituent parts.]
DATA ARE DISCOVER-ABLE
All data and associated metadata are indexed for search. The Data ecosystem ensures that all the data and metadata are discover-able to the point where all metadata is generally accessible to ensure awareness even when the data itself is generally not. [COMMENT: Clarify the meaning and extent of 'data'. Clarify the meaning of indexing. This could include: searchable as criteria, implemented for fast search ("indexed"), and/or available in the data returned from search.]
DATA ARE CONSUMABLE
All data persisted must be consumable. It is the responsibility of providers of new kinds of data types to ensure that this data is exposed in a way that is consumable by others. [COMMENT: Clarify the responsibilities and actions of the parties involved here. The idea is good. The implications are not clear.]
Review with respect to OSDU SoR goals, .....
WE ESTABLISH, MONITOR AND CONTINUOUSLY IMPROVE THE QUALITY OF DATA
OPTIMIZE SCHEMA AND QUALITY ON CONSUMPTION
Because the load process attempts to minimize data loss, schema is applied at the point of consumption either through translation on the fly or through the creation of new data with specific schema, representation and quality.
IMPROVED DATA ARE NEW DATA
Enrichment is a workflow and all improvements to the data result in new data.
DATA LINEAGE IS TRACKED
All data transformations, including all workflows, must provide lineage information. Data ecosystem provides a standardized framework for capturing data lineage.