This pages details the rationale behind retaining GroupType as a Mandatory Instance Level Property. Such a dedicated page is made for the attention the particular issue (of merging Kind with ResourceTypeID) deserves.
In summary, we propose to retain the OSDU GroupType in the blended system and as a Mandatory Property in the instance. The GroupType will have the same enumeration as it is today - such as master-data, work-product, work-product-component, reference etc. [Alan (24 Nov): There seems to be two parts to this:  retaining GroupType and  expressing GroupType as a mandatory property. It appears that the proposal for  is 'yes'. This is good as GroupTYpe is essential for guiding the data platform behavior across distinctly different behaviors for distinctly different types of data. For , I strongly disagree. For one thing, the GroupType of a resource/record instance is a permanent characteristic that may never change. Proposing the use of a separate property is incompatible with this requirement.]
The following sections explain the rationale.
A Note on Nomenclature Around Kind, Id and Property
An attribute that discriminates a whole class of things from another class is associated with a kind – for example wells are uniquely different from seismic lines and therefore type is associated with kind. [Alan (24 Nov): The 'type' that is associated with a kind is inherently the combination of GroupType and IndividualType. The natural and workable direction to take going forward is to expand 'type' in 'kind' to be 'GroupType/IndividualType'. For example, a Well master data overall type is master-data/Well. A seismic line [trace data] overall type is work-product-component/SeismicTradeData. A seismic acquisition method reference data overall type is reference-data/SeismicAcquisitionType. Slightly oversimplified, in current DES, 'type' in 'kind' has an implicit GroupType of work-product-component -- noting that the logical separation between this GroupType and the 'file' GroupType into two resource/record instances is not present. The reasons for this split is to isolate the properties used for entitlement checking and search criteria/returned-values from those associated with the actual data content [files], including their location path, format, etc.]
An attribute that uniquely identifies a specific instance within a given class or kind is its identity – for example the UBHI of a wellbore uniquely identifies this against other instances of boreholes. [Alan (24 Nov): True, however, the current OSDU ResourceID contains not only the GroupType/IndividualType, but also the UniqueKey. For master-data and reference-data this must be provided by the client calling the LOAD API (StartLoadingWorkflow) based on a business key. The example of UBHI for Wellbore master data is correct. For transactional GroupTypes (work-product, work-product-component, and (logical) file), there is a generated UniqueKey each time the LOADAPI is called for a work product and all of its constituent parts. For convenience, we have been including separately named properties for business keys of master and reference data resources/records. Clearly, these values must be the same as the UniqueKey in the ResourceID and may never be changed.)
An attribute whose change creates a new version of an entity rather than a new independent entity – this by definition is a property and therefore not part of the identity. [Alan (24 Nov): Agreed. How is this relevant to the question being considered? GroupType can not be changed as to do so would definitely create a new independent entity.]
A Note on "mastering" vs. "master-data"
We recognize the concepts of mastering & master-data are sometimes used interchangeably; whereas these concepts are indeed distinct:
Mastering == the act of enrichment, creating derivatives and hence can be assigned a new Kind each time a new type is derived. [Alan (24 Nov): Indeed, the term 'mastering' is used for the creation of new instances of data -- whether this is inside a data platform (what you call enrichment) or elsewhere.]
Master-data == A segregation or a group of data that relates to business entities. In other words it represents how the business systems or an enterprise is modeled, as opposed to digital archives and contents (work-product & work-product-components). The latter supplies context and data for the business entities in general. However in OpenDES context, there isn't a clear segregation; every kind is treated as an entity with a reference to a physical/logical data sets (example, file location, URN etc) as needed. [Alan (24 Nov): With the introduction of GroupType, the current distinctly different behavior for master and reference data GroupTypes if fully enabled and is expected to be achieved.]
- Enable data curation life cycle and creation derivative types, the system must allow for defining of a business entity (say a master-data entity such as Well), as a Kind say osdu:ihs:well:1.0.0. [Alan (24 Nov): Note well that current OSDU behavior only deals with 'types' of 'kind' OR GroupType/IndividualTypes that are the equivalent of WKS/WKEs. The current OSDU behavior must be expanded to account for 'non-WKS/WE' situation. This example is 'non-WKS/WKE'. This is accomplished in current DES by the first two elements of 'kind', which in the example are osdu:ihs. [Note further that there is a issue to rectify the names and meanings of the first two elements of 'kind' to achieve the overall intent of identifying the overall content format and the ability to read and understand the content (known as the schema owner).] The proposed extension to current OSDU behavior is to expand GroupType/IndividualType with additional elements. For this discussion, let's call these elements 'content-general-form' and 'content-read-form' (or cgf and crf). Summarizing, in OSDU going forward the full ResosurceType and ResourceID would be expanded to use GroupType/IndividualType/CGR/CRF. Now, we can match the example with a ResourceType of srn:Shell:type:mster-data/Well/ihs/osdu:1, where the namespace implementation instance and data partition is "Shell", the CGF is 'ihs' and the CRF is 'OSDU'. In words, this is uses the Well data model from IHS as interpreted/adapted by OSDU Forum.] [ Alex: 25 Nov. Thanks Alan, the two new concepts content-read-form and content-general-form are interesting concepts and presents new ways of relating namespace & source. This must be discussed further for a better understanding]
At the time of ingestion the system must allow for easier ingestion describing the data as-is. By way of enriching the data after it is ingested, a new kind osdu:osdu:well.1.0.0 can be produced. [Alan (24 Nov): This point is not clear. What is meant by produced? I think you are trying to say that an WKS/WKE instance can be created from the his data, which you represent with the first two elements of kind set to osdu. I don't believe that using osdu in these elements is a good idea. Better to illustrate examples in the context of a real company.) [ Alex - 25 Nov, You are correct. By produced, I meant that a new type and data is derived out from original form]
Technically both are master-data in OSDU context, but the latter one is enriched based on the business rules, processes and even data sets source from other vendors. The system must allow for such flexibility in modelling schema for a variety of such sources & business rules. [Alan (24 Nov): No. As stated already, there is provision in current OSDU behavior for receiving data in anything other than WKS/WKE forms.)
Provide for easier path for transition & migration, meaning it reduce the burden of migration of data whenever schema changes. [Alan (24 Nov): I don't understand this Please explain.] [Alex 25-Nov, the data & services code in OpenDES needs to be migrated/enhanced to accommodate any changes proposed to the Kind syntax]
Provide ways to segregate/group data, meaning retain the GroupType concept in OSDU
Verbosity of Kind Name, aiming to keeping it short [Alan (24 Nov): This is not relevant. Support for behavior is the primary consideration.] [Agreed, it is stated here for completeness, it may or may not have the same priority or relevance. Having a Type qualified as work-product-component/WellLogWorkProductComponent may make way for a redundant/repeated information, and presents an opportunity to make the definitions precise.]
How the principles compared
- [Alan (24 Nov). I disagree with the proposal for GroupType as a mandatory instance property for the reasons given above. Regarding curation, I have explained what is necessary for OSDU to support non-WKS/WKE data. Note that the immutable classification must include the first 3 elements of 'kind' -- not only the 'type' of 'kind'. Therefore the immutable id must refer to all of these elements/parts. As to the governance of GroupType, IndividualTYpe, etc., this is provided for by requiring what are called 'Type Instances'. A Type Instance is an instance of an OSDU Resource that establishes the validity of an OSDU Type, that is, a GroupType or a GroupType/IndividualType combination. With the expansion of the full classification to the equivalent of the first 3 elements of 'kind' leading to the expansion of the GroupType/IndividualType to include CGF and CRF, Type Instances will be required to setup valid combinations. The Type Instances also serve as the official home of all of the JSON schemas used for resources/records.]
GroupType as part of ResourceTypeID (aka Kind) and ID
GroupType as a mandatory instance property
|Enable data curation life cycle and creation of derivative types||Mastering happens outside the OSDU system||Scoped.
Provides for a Kind to be identified to accept as-is data (osdu:ihs:well:1.0.0) and the derived data can be assigned a new Kind (osdu:osdu:Well:1.0.0)
|Provide for easier path for transition & migration||OpenDES types & instances have to be migrated.
Furthermore the GroupType will have to be synthesized and allocated based on the right type
|Provides way to add attributions & extensions, since it is added as a property. GroupType will be a enumerated list and a mandatory field outside the “Data” block.
Going further, possible Types & GroupType can be listed and governed by OSDU through an external service/register that may provide a guidance & notes on how to extend. This is in line with how OSDU provides external validators for reference data
|Provide ways to segregate/group data||Data is segregated and grouped. It is possible to tell a group type by looking at the kind||Groups are inferred by looking through the data & introspecting the GroupType property. This additional burden is worth the benefits & flexibility such a design provides.
Further more, this is consistent with the way queries are formed today – Kind or ResourceTypeID is an indexed property anyway.
|Verbosity of Kind Name, aiming to keep it short||Longer Kind Name Longer id.
Furthermore a WorkProductComponent suffix is also usually attached to the Type, making the GroupType prefix redundant. Examples: work-product/WellLogWorkProductComponent, work-product/WellLogWorkProduct etc.
|Precise & shorter name|
OSDU R1 Schema
OSDU R2/R3 (Proposed Schema)
How an Instance is Versioned - master-data Example
How an Instance is Versioned - work-product-component Example
How the Data is Mastered
How to query data
|To query the "mastered" wells||Kind = "osdu:osdu:Well:1.0.0"|
|[Alan (24 Nov): Taking 'osdu' as the values used for WKS/WKE, "osdu,osdu,master-data/Well:1.0.0"]|
|To query the wells supplied by IHS||Kind = "osdu:ihs:Wellbore:1.0.0"|
|[Alan (24 Nov): Similarly, "osdu,ihs,master-data/Well:1.0.0"|
|To query all the "master-data" objects||Kind = "*:*:*:*", GroupType="master-data"|
|[Alan (24 Nov), " Kind = "*:*:\master-data/**"|
|To query the reference data for log channel codes||Kind = "osdu:osdu:LogChannelCodes:1.0.0"|
|[Alan (24 Nov), " "osdu:osdu:reference-data/LogChannelCodes:1.0.0" The OSDU modelling type would call this LogChannelType]|
|To query the reference data for well status codes||Kind = "osdu:osdu:WellStatusCodes:1.0.0"|
|[Alan (24 Nov), " "osdu:osdu:reference-data/WellStatusCodes:1.0.0". Similarly, use WellStatusType.]|
|To query the logs||Kind = "osdu:osdu:WellLog:1.0.0"|
|[Alan (24 Nov), This is for WKS/WKE well logs only. [Kind = 'osdu:osdu:work-product-component/WellLog:1.0.0"|