Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • E External Data Framework
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 74
    • Issues 74
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data Ingestion
  • External Data Sources
  • External Data Framework
  • Issues
  • #22

Closed
Open
Created Sep 28, 2020 by Jacob Rougeau@jrougeauOwner6 of 7 tasks completed6/7 tasks

Develop initial Fetch orchestration DAG - Horizon 1

As a Solution Architect I need automated orchestration that can fetch master and work product component data from a registered OSDU-compliant source (document from the External Source Registry functionality) using scheduling and connection metadata (scheduled job document from the External Source Registry functionality) so that external data can be prepared for ingestion.

Notes

  • Apache Airflow may be used. There is a spike story associated with this.

Tasks

Learning / set up

  • Setup local python environment
  • Setup local Java development environment
  • Install Airflow locally (Linux server. Linux VM, etc.)
  • Study Airflow DAGs, go through tutorials

Fetch DAG

  • Create Airflow DAG
  • Create Airflow operation that retrieves or leverages scheduled job information => replaced by Scheduler Dag
  • Create Airflow operation that retrieves data from external source.

Acceptance Criteria

  • Leverages OSDU native services were possible (no cloud specific provider information)
  • Job should only fetch changes since the last time the scheduled job ran
  • others tbd?
Edited Jan 16, 2021 by Jacob Rougeau
Assignee
Assign to
Time tracking