Manifests by reference
Rather than pass the full manifest payload to the Ingestion Workflow, pass a pointer (via dataset id, or some other construct) to the manifest to be processed.
This approach better handles larger manifests for manifest ingestion.
Details
- In this approach, a user or process would create one or more manifests (could be done using tooling)
- The user or process then uploads the manifests to OSDU using the Dataset service (e.g., get storage instructions, upload file, store metadata record, get the record id)
- Invoke the workflow service passing in on or more record ids that point to the uploaded manifest(s)
- Create a DAG operator capable of fetching the manifest(s) from storage using the Dataset service (get retrieval instructions)
Definition of Done
- A new DAG operator exists that can be added to ingestion workflows that are capable of reading in manifests into a DAG operator from the dataset service
Question for consideration What happens once the manifest is read and other DAG operators downstream in the workflow need to access the contents? Is it passed via XCom? Does it need to be passed?
@Kateryna_Kurach may be able to offer more details from a design and architecture perspective.