Manifest Ingestion DAG issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues2021-06-14T16:32:28Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/46Need common DAG deployment framework2021-06-14T16:32:28ZAlan HensonNeed common DAG deployment frameworkAt present, we lack a standard framework for deploying DAGs. There are multiple perspectives to consider:
- The deployment of code, which must also consider dependency conflicts
- The registering of DAGs. Presently, this is facilitated ...At present, we lack a standard framework for deploying DAGs. There are multiple perspectives to consider:
- The deployment of code, which must also consider dependency conflicts
- The registering of DAGs. Presently, this is facilitated via the latest Workflow APIs
Note some approaches in use today:
- CI/CD pipelines used to deploy code for security purposes
- IBM using Git Sync process for deploying DAGs
Considerations:
- Standardizing DAG Operator deployment with things like the Kubernetes Pod Operatorhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/2Please explore Apache Airflow limitations to finalize design for R32021-03-02T12:48:31ZRaj KannanPlease explore Apache Airflow limitations to finalize design for R3There have been a few concerns raised by Airflow as a generic DAG/orchestration solution for data flow in OSDU. It would be good to capture these issues here and to respond back with observations/solutions so the decision decision can be...There have been a few concerns raised by Airflow as a generic DAG/orchestration solution for data flow in OSDU. It would be good to capture these issues here and to respond back with observations/solutions so the decision decision can be properly captured.
1. Airflow is cloud-native only in GCP, which can make it cumbersome to host in other CSPs where the management of the infrastructure becomes a platform/operator responsibility unlike PaaS solutions.
1. With Airflow, it will be quite hard to isolate workflows as the workflows are within the same execution environment. As OSDU approaches "OSDU SaaS" and OSDU for smaller operators where it may be hosted by a SI or CSP, this can make it challenging for multi-tenant deployments.
1. Airflow DAGs are python only and some parsers and libraries can be Java or C++. Just as a comparison something like Argo which is kubernetes based could help have worksteps in different language/environments as each becomes a separate container instance rather than a python script.
1. Airflow apparently has an execution delay between tasks - it is unclear if this is a framework limitation or specific experience of a setup, but perhaps worth capturing to analyze.
1. Similarly there are concerns about temporary state/data and an intermediary persistence to hold across DAG worksteps. Beyond what can be held in memory, does Airflow provide a persistable temporary cache for such state?
1. Is the Airflow DSL cumbersome to author for ingestion/enrichment workflow providers (ISVs, SIs, operators). In comparison to YAML or other alternatives is this a good choice.
Once the elaboration work is complete, kindly capture this as a LADR for the Data flow project. Thanks for the advice on these issues.M1 - Release 0.1Ferris ArgyleFerris Argylehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/1Missing clear test strategy and test framework on Airflow DAG development2021-06-14T16:07:20ZWei SunMissing clear test strategy and test framework on Airflow DAG developmentNeed the clarity for DAG test framework (unit test and integration test) to ensure the code quality and no DAG broken.Need the clarity for DAG test framework (unit test and integration test) to ensure the code quality and no DAG broken.Ferris ArgyleDania Kodeih (Microsoft)Wladmir FrazaoJoeDmitriy RudkoAlan BrazKateryna Kurach (EPAM)Ferris Argyle