[Ingestion] Declarative Workflows, reusable operations
Declarative Workflows, reusable operations
Status
-
Initiated -
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
Bringing data into the OSDU data platform, and enriching it for future uses is a core value of the data platform.
There are two large ingestion patterns ETL and ELT.
ETL
Benefit
Favors the use of (multiple) external tooling and minimizes dependencies on the data platform providing transformation services since the burden remains external to the system.
Consequence
- Difficult to reuse operations since these are typically bound to specific tooling
- The knowledge that goes into the transforms remains outside the system
ELT
Benefit
- Keeps the knowledge of how the transforms are made within the system
- Allows data platform users to share workflows and operations
- Given that the data platform encourages continuous improvement in data already in the platform (enrichment), these same operations can be used post-ingestion to enrich the data.
Consequence
- Significant development activity
Decision
The OSDU Data Platform will provide an ingestion framework supporting the ability for a data manager to define and execute workflows that can be used to bring data of various types and sources into the data platform. This framework and supporting operations will be built up over time allowing the current ETL requirement to shift towards the ELT(T) model.
Rationale
Consequences
When to revisit
Edited by Stephen Whitley (Invited Expert)