|
|
# Ingestion Workflow using CSV DAG
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
This document guides users who want to leverage the pre-registered CSV Directed Acyclic Graph (DAG)
|
|
|
and have their data ingested from CSV files. It covers the existing CSV DAG capability, defines a way to
|
|
|
trigger the DAG and describes how to push the input CSV files for ingestion.
|
|
|
|
|
|
## Ingestion Framework
|
|
|
|
|
|
OSDU Data Platform R3 ingestion framework is built to ingest various types of data files like
|
|
|
CSV/LAS/DLIS etc. with Apache Airflow, an open-source platform to Author, Schedule and Monitor
|
|
|
workflows. Workflows in Airflow are collections of tasks that have directional dependencies. Specifically,
|
|
|
Airflow uses a DAG to represent a workflow. At a high level, a DAG can be thought of as a container that
|
|
|
holds tasks and their dependencies, and sets the context for when and how those tasks should be
|
|
|
executed. There are various components in the framework which enable uploading a file, creating the
|
|
|
required schemas for ingestion or invoking the ingestors. Each is explained in detail in sections that
|
|
|
follow. The ingestion framework comes with some pre-registered DAGs. These are available out of the
|
|
|
box and can be used by end users to execute specific workflows. Some of these pre-registered DAGs are
|
|
|
for CSV, LAS and Document ingestion workflows. |
|
|
\ No newline at end of file |