Develop initial Fetch orchestration DAG - Horizon 1
As a Solution Architect I need automated orchestration that can fetch master and work product component data from a registered OSDU-compliant source (document from the External Source Registry functionality) using scheduling and connection metadata (scheduled job document from the External Source Registry functionality) so that external data can be prepared for ingestion.
- Apache Airflow may be used. There is a spike story associated with this.
Learning / set up
Setup local python environment
Setup local Java development environment
Install Airflow locally (Linux server. Linux VM, etc.)
Study Airflow DAGs, go through tutorials
Create Airflow DAG
Create Airflow operation that retrieves or leverages scheduled job information => replaced by Scheduler Dag
Create Airflow operation that retrieves data from external source.
- Leverages OSDU native services were possible (no cloud specific provider information)
- Job should only fetch changes since the last time the scheduled job ran
- others tbd?