Update OLD R3 MVE Manifest Ingestion Definition of Done Syntax Schema Validation authored by Alan Henson's avatar Alan Henson
![R3_Manifest_Ingestion_Workflows](https://community.opengroup.org/groups/osdu/platform/data-flow/ingestion/-/wikis/uploads/585ec5ea46b1dab06a18ce8be4be5d41/R3_Manifest_Ingestion_Workflows.png)
**WARNING: THIS PAGE IS A WORK IN PROGRESS. THE CONTENTS ARE IN DRAFT FORM AND ARE NOT APPROVED FOR IMPLEMENTATION.**
## Overview ##
Below is the definition of done for the R3 Minimum Viable Expectation (MVE) Manifest Ingestion Syntax/Schema Validation process. As depicted above, this stage of the MVE Manifest Ingestion workflow occurs at the point an R3 Manifest is presented to the [Ingestion Service API](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service). This stage of the workflow is responsible for ensuring the presented Manifest is structurally correct (e.g., it is syntactically correct and abides by the schema definition). If the presented Manifest passes syntax/schema, the Ingestion Service will initiate the workflow that manages the [Pre-Pass](https://community.opengroup.org/groups/osdu/platform/data-flow/ingestion/-/wikis/Manifest-Ingestion/R3-MVE-Manifest-Ingestion/R3-MVE-Manifest-Ingestion-Definition-of-Done---Pre-Pass) and Process stages as depicted in the high-level conceptual diagram above.
The Validation stage is defined in the [Core-Concept-Input_MVE-with-Ingestion-UseCases_Rev-02.pdf](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/data-prep/docs/-/blob/master/Design%20Documents/Ingestion/Core-Concept-Input_MVE-with-Ingestion-UseCases_Rev-02.pdf) document under the Pre-Pass section.
*NOTE: At the time of writing, the Ingestion Service has two [API endpoints](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/api/SubmitApi.java) that both operate under the impression that a File is associated with the presented Manifest. It is understood that design discussions are underway in the exploration of a [Dataset](https://community.opengroup.org/osdu/platform/system/home/-/issues/65#register-pane) concept. If/when those proposed changes are adopted, and the Ingestion Service API endpoints are updated, the references to File below will be updated accordingly.*
## High-Level Understanding ##
Greater detail is provided below, but the following represent the high-level steps this stage (Validation) of the MVE Manifest Ingestion workflow will undergo.
- A process submits a Manifest to the designated Ingestion Service API endpoint
- A Manifest in this sense represents one of the three following types
- [Master Data](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/master-data)
- [Reference Data](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/reference-data)
- [Work Product](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/work-product) w/ [Work Product Components](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/work-product-component)
- The expectation is that the MVE Manifest Ingestion process will process one and only one of the above. The caveat is that Work Product will have a collection of Work Product Components
- The Ingestion Service API endpoint must ensure the process has been authenticated and is authorized to invoke the [`submitWithManifest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/api/SubmitApi.java) endpoint.
- If the process is not authenticated or not authorized, then the process should throw the appropriate security exception
- If the process is both authenticated and authorized, then the API may continue processing the request
- At the time of writing, the current [`submitWithManifest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/api/SubmitApi.java) endpoint does not accept Master Data or Reference Data schemas.
- It instead accepts a [`WorkProductLoadMainfest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/model/WorkProductLoadManifest.java) object which contains containers for the Work Product, Work Product Components, and source Files
- To support Master Data and Reference data, either a new endpoint or an adaptation to the existing [`submitWithManifest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/api/SubmitApi.java) endpoint is required
- Inspect the Manifest for its `kind` - this is provided as a property
- Today's [`ILoadManifestValidationServidce`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/validation/schema/LoadManifestValidationServiceImpl.java) implementation leverages a hardcoded Schema title
- We will need to update this logic to leverage the provided [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3)
- Presently, the [`SchemaRepository`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/provider/interfaces/ISchemaRepository.java) is leveraged. This might need to change to the [`Schema Service`](https://community.opengroup.org/osdu/platform/system/schema-service/-/blob/master/docs/api/schema.yaml)
- NOTE: Taking this approach enables custom schemas, which is beyond the scope of the MVE. But that's okay. This MVE approach is meant to produce learnings.
- Query the OSDU Schema Service using Core Services [Schema API](https://community.opengroup.org/osdu/platform/system/schema-service) `getSchema` endpoint
- If the specified [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3) is not associated with a registered schema, then an exception is thrown, which will terminate the request
- Design Decision: should we continue validating the included schemas if the primary structure is invalid? Likely not.
- On obtaining the Manifest's schema from the Schema Service, the presented Manifest must be validated against the Schema returned from the Schema Service based on the provided [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3)
- If the schema validation checks pass, then the request is allowed to proceed
- Future iterations of the Manifest Ingestion workflow should support initiating ingestion workflows specific to the specified [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3), but the MVE will leverage a default workflow, which is the expected outcome when a [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3) does not resolve to a workflow
- The [`submitWithManifest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-service/-/blob/master/ingest-core/src/main/java/org/opengroup/osdu/ingest/api/SubmitApi.java) endpoint will submit the Manifest to the Ingestion Workflow service for processing using the default Ingestion Manifest workflow
- The workflow to run should be resolvable by the Workflow Type (`WorkflowType.INGEST`) and the provided [`kind`](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/issues/3)
- The expectation is that the default workflow captured in the Pre-Pass and Process stages will ship as a default workflow with OSDU implementations and will by default be registered in the workflow registry
## Questions for Consideration ##
- Should the `submitWithManifest` endpoint of the Ingestion Service API endpoint take an object or a string? If we enforce an object, then we're in essence attempting to encapsulate all supported Manifests in object form for this endpoint. This approach might also interfere with different versions of the Manifests within the same deployed OSDU instance.
- Thomas Gehrmann [SLB]: it depends on the API. The API can demand the manifest schema in a specific version (should there be multiple versions). Having a `kind` in the payload (defined by the Manifest schema) can offer greater flexibility, but the consuming code will have to cope with this flexibility. If you want such a property in the payload, please add an issue.
- Should an error occurring during the validation phase trigger an error workflow to allow for custom handling of the error?
\ No newline at end of file