Develop initial Fetch orchestration DAG - Horizon 1
As a Solution Architect I need automated orchestration that can fetch master and work product component data from a registered OSDU-compliant source (document from the External Source Registry functionality) using scheduling and connection metadata (scheduled job document from the External Source Registry functionality) so that external data can be prepared for ingestion.
Notes
- Apache Airflow may be used. There is a spike story associated with this.
Tasks
Learning / set up
-
Setup local python environment -
Setup local Java development environment -
Install Airflow locally (Linux server. Linux VM, etc.) -
Study Airflow DAGs, go through tutorials
Fetch DAG
-
Create Airflow DAG -
Create Airflow operation that retrieves or leverages scheduled job information => replaced by Scheduler Dag -
Create Airflow operation that retrieves data from external source.
Acceptance Criteria
- Leverages OSDU native services were possible (no cloud specific provider information)
- Job should only fetch changes since the last time the scheduled job ran
- others tbd?