Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • H Home
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 15
    • Issues 15
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • Home
  • Issues
  • #42
Closed
Open
Issue created Sep 29, 2020 by Stephen Whitley (Invited Expert)@stephenwhitley1 of 6 checklist items completed1/6 checklist items

Upgrade the role of the Manifest

Upgrade the role of the Manifest

Status

  • Initiated
  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

There is a lot of debate about the necessity and role of the ingestion manifest. Most examples of Manifest seen in R1 and R2 carry information that can be extracted from the data files that are being ingested. This has lead to the perception that they are largely redundant and in many cases can be replaced by parsing the data during ingestion.

However; while examples seem superfluous, there is a real need to automate ingestion as much as possible. To do this, we should upgrade the purpose of the manifest to be a well formed description of an ingestion job, providing

  • A complete list of all elements that will be involved in the ingestion workflow
  • semantic relationships among these elements that
    • inform the ingestion workflow itself; and/or
    • are captured in the metadata to support referential integrity
  • additional metadata to supplement the content that can be extracted from these elements
    • business properties such as company, contract, location, etc.
    • system properties such as entitlement
  • guidance on dealing with exceptions

By doing so, we can remove a good deal of the ambiguity and complexity from the ingestion process itself.

Decision

Invest in the Manifest definition and support in the ingestion framework to achieve the goals defined above.

Rationale

Providing structured information to the ingestion framework reduces complexity in the services and framework itself. The ingestion process no longer has to infer intention. This allows us to allocate data loading requirements to pre-ingestion (completing the manifest) and ingestion (interpreting and honoring the manifest).

The Manifest itself; does not have to exist in the form of a file; it can be generated and passed directly to the Ingestion Service as well-formed JSON.

Consequences

We need to be cautious about loading up too many requirements into the Manifest since its purpose is to ease ingestion rather than complicate it.

When to revisit

Edited Sep 29, 2020 by Stephen Whitley (Invited Expert)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking