ADR Master and Reference Schema versioning; SRN format
Change Type:
-
Feature -
Bugfix -
Refactoring
Context and Scope
1. Reference and Master Data Schema version format
Different aspects of OSDU Reference Schemas (RS), Reference OSDU Resources populated with specific Reference Value Lists, and other OSDU schemas can change with time. It was discussed on the Data Definitions team and Reference Data Ingestion meetings that there are requirements to track these different categories of change/versioning. Many of the identified categories are below. We have added other versioning categories and clarifications as well.
1.1 For any OSDU schema, capture:
- Schema version - Describes the version of the Schema structure. Usually a new schema structure version will be delivered together with a new OSDU release, but minor schema versions may also be released (e.g. a schema change that simply adds a property (which is a non-breaking change)).
Question: Is governance established that the schema version will be tracked by the schema name, or was this a temporary solution by Thomas Gehrmann? Is this documented somewhere? If not, then OSDU needs to establish the proper governance on this and document it.
Question: Are we capturing minor and major schema changes? If yes, how is each defined?
- Resource version - Data change within the same schema version. The schema/structure itself didn't change, but a new version of the Resource was added to OSDU (e.g. because one or more property values needed to be updated). For most schemas, it is understood that data change simply creates a new Resource, with incremented version, with the different data values. However, this concept deserves special attention with Reference Data Values/Lists since changing some Reference Data Values can sometimes have massive and breaking data management consequences (e.g. Reference Lists classified by the DD&M subcommittee as “fixed” are defined by OSDU. This exact list is critical either to system functionality or to industry interoperability).
This version number must be incremented regardless of what the reason was for any change to the contents of the data, including the categories below in the Reference Values section.
- Source –Uniquely describes the system and/or organization from which this data object comes. Many different source-versions can attempt to identify the same real-world object (such as Wellbore) or activity (such as Production Volume reporting). (For a Wellbore, for example, this would be similar to PPDM’s WELL_VERSION.)
Ideally, we could track:
-
Source to my organization (value would capture an outside organization) “data.DataSourceOrganisationID” property?
-
Source system/application/database “source” property?
Notes: This identifies a version data that attempts to define a real-world object or measurement, not a version of a data object that would need to be numerically incremented like the other version categories here.
1.2 For Reference Values:
-
Reference Value data changes – In addition to the general “version” resource property, the following properties are needed to better govern Reference Value lists:
-
OSDU-governed: You might create a new version because of an OSDU-governed change to a reference list. The OSDU Reference List version must be captured, and incremented whenever an updated OSDU-governed list is published and subsequently used in a Reference Data resource. This applies to the OSDU-governed reference values in an “open” list, and to “fixed” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
-
Locally governed: You might create a new version because a governed Reference List for a particular implementation was updated, like at an operator (i.e. “open” and “local” reference list category). The locally-governed Reference List version must be captured and incremented whenever the local data governance group publishes and the list is susbsequently used in a Reference Data resource. This applies to the reference values in an “open” list, and to “local” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
-
Attribution Authority: For any reference value or reference list, those values and descriptions may have been created by OSDU or by an outside organization (such as PPDM or Energistics). Both OSDU and outside standards may change over time, so it is critical to capture both the source organization and the publication version of those outside standards used. This is already accommodated by the Attribution Authority, Publication, and Revision properties which are standard Reference Resource properties.
Note: this is different than the “OSDU-governed” versioning category mentioned above. The OSDU-governed versioning category refers to a complete list of Reference Values for a particular reference object. The attribution authority is captured to each value individually. In other words, an OSDU-governed reference list could potentially include some values created by OSDU attribution authority and some from an outside attribution authority, but the list as a whole will be considered “OSDU-governed”.
Summary: OSDU should establish clear governance to appropriately and consistently track these categories of versioning:
For any resource:
-
Schema Version (might exist in schema name format; needs confirmation)
-
Resource Version (exists)
Additional to Reference List resources:
-
OSDU-governed list version (does not exist)
-
Locally-governed list version (does not exist)
-
Attribution Authority + Publication + Revision (exists)
The best solution would be to create appropriate properties for the version categories that do not yet exist.
In addition, OSDU should also capture the OSDU governance category of Reference Value Lists within the reference schema and resource itself: “Fixed”, “Open”, or “Local”. A way to capture this does not exist yet.
2. SRN format
Also, decision has to be taken regarding SRN format. It must be decided whether it has to contain corresponding schema version or not. Currently SRN doesn't contain a version (e.g. "srn::reference-data/VerticalCRS:MSL:").
Note: Tentatively, we think that capturing Schema Version + Resource Version in the schema name would uniquely identify resource referenced (like a foreign key).
For reference lists, you want to be able to identify the specific version of the reference list that a WPC (e.g.) references.
However, for a WPC (Marker, e.g.) to reference a parent Master object (Wellbore, e.g.), it doesn’t need to reference a specific point-in-time version of it; It should reference the most recent version.
If this is true, SRNs for Reference Data would need to include Schema + Resource Version in the SRN, but SRN would be more generic for all other group types.
Problem: SRN identity is uncertain.
A. Is SRN intended to uniquely define the physical real-world object in the case of Master Data (like a Wellbore)? If yes, then SRN should not contain version for Master Data references.
B. Or is SRN intended to uniquely define a data record with its version (like a GUID)? If yes, then Master Data Version should be included in the SRN.
It should not be used for both, but both must be accommodated by OSDU.
Some additional condiderations:
A. Version is NOT included in an SRN.
Pros:
- It simplifies end-user aggregation of data to a single parent record. Your WPCs, created at different times will be referencing the same Master data record, not a point-in-time older version of that Master record. Existing WPC are always in the "current" state and users do not have to enrich and create a new version of WPC each time corresponding RS or Master Data Schema changes.
Cons:
-
It leaves the question open as to how you could have different Wellbore Versions (similar to WELL_VERSION in PPDM). It seems that this is not currently supported by the OSDU canonical schemas, but is a real use case – similar to the way you can have different versions of Trajectories in WPC.
-
You can loose aspects of historical parent-child relationships/data lineage. For example, a Trajectory might have TVD calculated based on the “active” elevation of a particular Wellbore resource version. Then that Wellbore gets updated, and the newest version of that Wellbore record has a different active elevation type or active elevation value. Now the Trajectory file is out-of-sync in this regard with its parent Wellbore from that point-in-time.
B. Version is included in an SRN.
Pros:
- It it potentially allows you to have different Wellbore Versions (using UWI and Source, for example, as the natural key)
- Traceability and lineage of the data
Cons:
- Raises the question of how to uniquely identify the one physical wellbore, or the “gold” Wellbore record (similar to WELL in PPDM)
- Complexities with updating existing WPCs that have links to older versions of MDS. End-user aggregations can be disaligned if there are WPCs in the system that are linked to different schema versions.
- Another consideration is related to possible future search complexity. If SRN value changes, some WPC could be found using "new" SRN value and some WPCs should be found using "old" SRN value.
Users will have to implement additional enrichment workflows to solve the issues related to SRN version descrepancies (and probably develop some functions that will detect all "outdated" SRN links). That leads to high usage of computing resources (e.g. to change all WPC SRNs to point to the new version etc).
Decision
- There is a requirement to track Schemas versioning. Decision has to be taken on the Schema version format (especially for Reference Schemas)
- Decision has to be taken on the SRN format: will it contain Schema version or not ("srn::reference-data/VerticalCRS:MSL:" VS "srn::reference-data/VerticalCRS.1.0.0:MSL:"
Rationale
Consequence
-
No consequences for CSPs
-
Consequences for majority of the OSDU services. Change in the Schema definition will lead to the change in the Manifest creation process as well as in Enrichment and Delivery API.