-[Definition of Done](https://community.opengroup.org/groups/osdu/platform/data-flow/ingestion/-/wikis/Manifest-Ingestion/R3-MVE-Manifest-Ingestion#definition-of-done)
This page captures the Scope, Definition of Done, Horizons, and Milestones for the R3 Manifest Ingestion workflow. Note that everything on this page is a Work In Progress, and nothing is committed or guaranteed. The diagrams below do not yet incorporate the File or Dataset services, but will in the near future. The R3 Manifest Ingestion will deliver data loading capabilities design to meet the initial needs of loading data into OSDU while providing a framework for implementations of more robust ingestion processes.
The approach for R3 centers on the following concepts:
* Pre-ingestion work helps ensure well-formed data enters OSDU
* The latest Data Definition Schemas ([v1.0.0](https://community.opengroup.org/osdu/data/data-definitions/-/tree/1bdc6e43858d7f0202316135ee4b9a943a26e297)) provide robust data modeling and relationship modeling capabilities that enable programmatic enforcement
* Loading by Manifest ensures the metadata describing the underlying source data adheres to the [Well Known Structure](https://community.opengroup.org/osdu/documentation/-/wikis/OSDU-(C)/Design-and-Implementation//Entity-and-Schemas/Demystifying-Well-Known-Schemas,-Well-Known-Entities,-Enrichment-Pipelines) concept, a requirement for interoperability and a [promise](https://osduforum.org/about-us/who-we-are/osdu-mission-vision/) of OSDU.
* We must first get the basic metadata into OSDU, but enable more complex workflows capable of building more robust datasets using the source data and capabilities of the platform. This approach preserves the source data while also creating and presenting consumption ready data products.
## R3 Manifest Ingestion Scope ##
The scope for R3 Manifest Ingestion is documented via Ingestion Uses cases found [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/data-prep/docs/-/blob/master/Design%20Documents/Ingestion/Core-Concept-Input_MVE-with-Ingestion-UseCases_Rev-02.pdf). For more details, see the _In Scope_ and _Out-of-Scope_ sections below.
The picture above depicts the conceptual architecture for the R3 Manifest Ingestion scope. Much of the complexity has been extracted for the sake of simplicity, but the picture hopefully illustrates the intent. We will define scope through the Definition of Done. In short, the following is considered _In-Scope_ for R3.
- Validate (Syntax and Content) and Process the contents of a [Manifest](https://community.opengroup.org/osdu/data/data-definitions/-/blob/1bdc6e43858d7f0202316135ee4b9a943a26e297/Generated/manifest/Manifest.1.0.0.json)
- Process the contents of a CSV file into OSDU
- Process the contents of Energistics standardized content (e.g., WITSML)
### Definition of Done ###
This is a high-level definition of done for the R3 Manifest Ingestion workflow.
- A process must present a well-formed and correct Manifest to the Ingestion Service endpoint for processing
- The Ingestion Service must support multiple, simultaneous calls and scaling as required to meet demand
- Scaling limits TBD
- The Ingestion Service may perform the following activities at this stage:
- Confirm the calling process is authenticated and authorized to invoke the `submitWithManifest` endpoint
- Verify the Manifest Schema exists within the OSDU instance's Schema Service
- Fetch the Manifest Schema via the OSDU instance Schema Service (the process must be authenticated and authorized to perform this query and fetch)
- Validate the provided Manifest is syntactically correct per the indicated Manifest Schema `kind`
- This process is completed for each Reference Data, Master Data, Work Product, Work Product Component, and File element included
- Where determinable, elements provided with a valid `id` will be checked for existence in OSDU using the Storage Service. If Reference Data or Master Data already exists, an error is generated indicating data duplication
- The validation will also include searching for any supported annotation extensions, such as `x-osdu-relationship` and programmatically validating correctness where possible
- Should validation errors occur, the Ingestion Service will terminate and return those errors to the calling process
- Invoke the `Storage API` for each record
- This process does not support rollback. Errors that occur during this process may require manual resolution (alternatively, cleanup workflows could be established to handle these situations if the errors are pushed to the Notification Service)
- A failure of one record does not constitute the failure of all contents in the Manifest
- A failure of a parent record will prevent the Ingestion Service from processing the child records (this only applies to Work Product and Work Product Component as the Manifest does not support hierarchical relationships with Reference Data, Master Data, and Files)
- A a part of the write process, `surrogate-key`s, where specified, are resolved to the system assigned `id` created on a successful `createOrUpdateRecords` call
- Once the Manifest file is fully processed, the results of the process are returned to the calling process
### Out-of-Scope ###
- The design and implementation of the Manifest Ingestion process may require updates to other OSDU services and components within the platform. Where those changes are required, we will submit the required ADRs and work through the required processes to have those items approved and implemented
- Ingestion Workflow capabilities supporting Enrichment, Extraction, Reclassification, parsers [CSV, Energistics], re-processing, etc. The Ingestion Framework supports the implementation of these pipelines, but the Manifest Ingestion team is not responsible for delivering these pipelines
- Bulk loading - another critical component, but we're starting simple. The Manifest does support some concepts of Bulk Loading, though, for R3, we may artificially limit bulk loading via the Manifest file
- Any activity involving the positioning of files or datasets into the OSDU platform - the MVE expects the completion of this step before presenting a Manifest to the Manifest Ingestion Service (i.e., loading Files or Datasets into the platform)
- Any activity involving the creation of the Manifest is outside the scope of R3 Manifest Ingestion
| Day 0 of R3 Manifest Ingestion. Able to submit a manifest that is prepopulated with required data, including `id`s and successfully write the data via the storage service. Basic schema validation occurs. Basic `exists` checks occur for cited data. | Able to process `surrogate-key`s. Integration with the new [Schema Service](https://community.opengroup.org/osdu/platform/system/schema-service). Provide support for Dataset Registry (if available). Provide _hook_ for initiating Ingestion Workflows via published messages from the Storage Service. Integrated testing. | Pre-Release activities. Operational readiness. Solution hardening. |
## Milestone 3 ##
(WIP)
This is our target for _Day 0 or R3 Manifest Ingestion_ (that is, the most basic functionality qualifying as Manifest Ingestion). Able to submit a pre-populated manifest with `id`s specified (vs. `surrogate-key`s) using the 1.0.0 version of the [Schema Manifest](https://community.opengroup.org/osdu/data/data-definitions/-/blob/1bdc6e43858d7f0202316135ee4b9a943a26e297/Generated/manifest/Manifest.1.0.0.json) to the Ingestion Service API endpoint.
- Schema validation for R3 schemas (Master Data, Reference Data, Work Product, Work Product Components, and File)
- Additional content validation capabilities, which includes verifying that cited data exists and data relationships are correct
- Load one and only one manifest at a time (bulk loading is managed externally to ingestion process)
## Milestone 4 ##
(WIP)
Able to submit a pre-populated manifest to the Ingestion Service with support for `surrogate-key`s to enable on-write resolution of `id`s and construction of Work Product, Work Product Component, File and Dataset relationships.
- Additional content validation capabilities, which includes verifying data relationships are correct per schema definitions
- Able to coordinate writes to the storage service and properly update `surrogate-key`s specified in the Manifest file to preserve relationships when `id`s are not available at manifest generation time
- Integrated with the new [Schema Service](https://community.opengroup.org/osdu/platform/system/schema-service) to fetch schemas for validation
- Ingestion Workflow "integration" with the Notification Service to trigger Ingestion Workflows
## Milestone 5 ##
(WIP)
This is a stretch goal. The hope is to converge the four ingestion workflow streams into some common components to unify how data is ultimately written. The four ingestion workflows are Energistics (e.g., WITSML), EDS (External Data Sources), CSV, and Manifest. Each workstream is composing DAGs to execute within the Ingestion Framework and each workstream has DAG Operators performing common functionality. Ideally, the DAG operators are composable and reusable.