Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • C csv-parser
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 23
    • Issues 23
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 3
    • Merge requests 3
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • csv-parser
  • csv-parser
  • Wiki
  • Home

Home · Changes

Page history
Create home authored Jul 08, 2022 by Chad Leong's avatar Chad Leong
Hide whitespace changes
Inline Side-by-side
home.md 0 → 100644
View page @ 8fbbc606
# Ingestion Workflow using CSV DAG
[[_TOC_]]
## Introduction
This document guides users who want to leverage the pre-registered CSV Directed Acyclic Graph (DAG)
and have their data ingested from CSV files. It covers the existing CSV DAG capability, defines a way to
trigger the DAG and describes how to push the input CSV files for ingestion.
## Ingestion Framework
OSDU Data Platform R3 ingestion framework is built to ingest various types of data files like
CSV/LAS/DLIS etc. with Apache Airflow, an open-source platform to Author, Schedule and Monitor
workflows. Workflows in Airflow are collections of tasks that have directional dependencies. Specifically,
Airflow uses a DAG to represent a workflow. At a high level, a DAG can be thought of as a container that
holds tasks and their dependencies, and sets the context for when and how those tasks should be
executed. There are various components in the framework which enable uploading a file, creating the
required schemas for ingestion or invoking the ingestors. Each is explained in detail in sections that
follow. The ingestion framework comes with some pre-registered DAGs. These are available out of the
box and can be used by end users to execute specific workflows. Some of these pre-registered DAGs are
for CSV, LAS and Document ingestion workflows.
\ No newline at end of file
Clone repository
  • Status
  • Home