Added another question and fixed an incomplete validation rule. authored by Alan Henson's avatar Alan Henson
......@@ -57,6 +57,7 @@ NOTE: Where "Manifest Ingestion Workflow" is referenced, we are referring to the
### Open Question ###
1. How does a re-process manifest ingest request change the behavior of the ingestion? Theoretically, the validation might be different in that objects with an `id` value specified must already exist within the platform. In an initial processing request, specified `id` values assume that an external process has determined the `id` of the entity vs. letting the Storage service make that determination.
2. How should the `IsExtendedLoad` and `IsDiscoverable` flags within the `AbstractAnyRecordWorkProduct` and `AbstractAnyRecordWorkProductComponent` schema definitions affect the ingestion process?
3. What is the best construct for identifying orphaned data as it is written? An `orphan` construct perhaps?
### 1. Initiating Ingestion ###
......@@ -111,8 +112,8 @@ _Requirements_
| --------------- | ----------- | --------- |
| Manifest Syntax check | Fetch the schema definition from the Schema Service for the `kind` property of the manifest. Validate that the entire manifest is correct according to its schema validation. If the `kind` does not map to a Schema Definition, throw/log an error and terminate the workflow. | Yes |
| Content Syntax Validation | Traverse the manifest for `kind` properties. For each `kind` property found, retrieve the schema definition from the Schema Service. Validate the entire object containing the `kind` property against the schema returned by the Schema Service for that `kind`. If the Schema Service is unable to find a Schema Definition for a given `kind` throw/log an error and terminate the workflow. | Yes |
| Unknown attributes | The validation should ensure
| Valid Hierarchy | You might expect this validation rule to exist in the pre-pass, but the schema definitions allow us to get this rule for free as part of the property format validation. Take the [WellLog.1.0.0.json](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/work-product-component/WellLog.1.0.0.json) schema definition as an example. A Well Log has an optional `resourceHomeRegionID` property. If the property is specified, then it must follow the format `^[\\w\\-\\.]+:reference-data\\/OSDURegion:.+:[0-9]*$`, which means any value referencing a type other than an OSDURegion will fail. We can therefore perform the validation check without requiring knowledge of the data's domain. | N |
| Unknown attributes | The validation should ensure only those attributes definition with the schema's definition are present. A schema may have a `data.ExtensionProperties` property, which is where undefined attributes should go. However, the schema validation by nature will pass this section if unknown attributes exist because the schema definition allows for it. Other, unknown attributes outside of this section must generate a validation error and terminate the workflow. | Yes |
| Valid Hierarchy | You might expect this validation rule to exist in the pre-pass, but the schema definitions allow us to get this rule for free as part of the property format validation. Take the [WellLog.1.0.0.json](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/work-product-component/WellLog.1.0.0.json) schema definition as an example. A Well Log has an optional `resourceHomeRegionID` property. If the property is specified, then it must follow the format `^[\\w\\-\\.]+:reference-data\\/OSDURegion:.+:[0-9]*$`, which means any value referencing a type other than an OSDURegion will fail. We can therefore perform the validation check without requiring knowledge of the data's domain. | Yes |
### 5. Pre-Pass ###
......@@ -139,9 +140,11 @@ Some of the rules below rely on the use of the OSDU schema definition extension
| Validation Rule | Description | Required? |
| --------------- | ----------- | --------- |
| Surrogate Keys | Ensure the use of `surrogate-keys` is consistent and accurate. Ensure all `surrogate-key` references to a parent entity are resolved within the manifest (i.e., no orphaned `surrogate-keys`). The validation process requires identifying within a schema definition the use of the `x-osdu-relationship` extension property and then checking the manifest's value for that property to see if it has the `surrogate-key` pattern (e.g., `^(surrogate-key:.+|[\\w\\-\\.]+:`). If it does, then an entity must exist within the manifest payload that has an `id` property with a matching `surrogate-key` value. If not, then an invalid reference exists. Throw/log an error and terminate the workflow. | N |
| Duplication | This requires more research. I believe the intent is to validate that MasterData and ReferenceData provided with a pre-set `id` property do not already have an entry in OSDU with the same `id`. If it does, then the process is trying to load duplicate data and it should be rejected. Error is thrown/logged and workflow terminated. | Y |
| Cited Data Exists | If a property is found within a `kind`'s schema definition to contain an `x-osdu-relationship` definition, and the value of the property within the manifest payload does not have a `surrogate-key` pattern, then fetch the value and leverage the Storage API to determine if the referenced data exists. If it does exist, the validation passes. If not, the validation fails, an error is thrown/logged, and the workflow is terminated. This rule is applicable to references to Reference Data and Master Data. | Y |
| Duplication | This requires more research. I believe the intent is to validate that MasterData and ReferenceData provided with a pre-set `id` property do not already have an entry in OSDU with the same `id`. If it does, then the process is trying to load duplicate data and it should be rejected. Error is thrown/logged and workflow terminated. | Yes |
| Surrogate Keys | Ensure the use of `surrogate-keys` is consistent and accurate. Ensure all `surrogate-key` references to a parent entity are resolved within the manifest (i.e., no orphaned `surrogate-keys`). The validation process requires identifying within a schema definition the use of the `x-osdu-relationship` extension property and then checking the manifest's value for that property to see if it has the `surrogate-key` pattern (e.g., `^(surrogate-key:.+|[\\w\\-\\.]+:`). If it does, then an entity must exist within the manifest payload that has an `id` property with a matching `surrogate-key` value. If not, then an invalid reference exists. Throw/log an error and terminate the workflow. | No |
| Cited Data Exists | If a property is found within a `kind`'s schema definition to contain an `x-osdu-relationship` definition, and the value of the property within the manifest payload does not have a `surrogate-key` pattern, then fetch the value and leverage the Storage API to determine if the referenced data exists. If it does exist, the validation passes. If not, the validation fails, an error is thrown/logged, and the workflow is terminated. This rule is applicable to references to Reference Data and Master Data. | No |
Note: We need to determine how best to handle orphaned data that we're able to determine is orphaned.
### 6. Process ###
......
......