... | ... | @@ -22,7 +22,11 @@ Airflow Scheduler processes all DAG files at each heartbeat and executes all the |
|
|
**Lightweight XCom**
|
|
|
|
|
|
Each pushing to XCom writes XCom messages to the Data Base with serialization and deserialization of these data, which causes extra overhead.
|
|
|
Using XComs for passing path to data, which may be stored in some storage, is more preferable than passing the large data between tasks.
|
|
|
Using XComs for passing paths to data, which may be stored in some storage, is more preferable than passing the large data between tasks.
|
|
|
|
|
|
**"Thin" operators**
|
|
|
|
|
|
Keep operators as small as possible and hide your business logic in “setupable” libs.
|
|
|
|
|
|
**Kubernetes Pod Operators**
|
|
|
|
... | ... | @@ -40,6 +44,10 @@ Ingestion is ultimately about storing data into the OSDU<sup>TM</sup> data platf |
|
|
|
|
|
At the time of writing, there was no unified, platform approach to logging and reporting for asynchronous processes. In most cases, the provider has an underlying logging framework that supports aggregation and dashboarding. Airflow also has a console UI the provides access t the logs generated during the DAG run. However, neither of these solutions is user-friendly. In Airflow, each DAG operator, or task, has its own logging area. Meaning, for a DAG that has multiple operators, a user has to look into the logs for each step to understand what happened. To overcome this limitation, the short-term solution is to write a summary logging outcome to XCom enabling the users to view the results in one place despite where the activity occurred in the DAG.
|
|
|
|
|
|
**Web Server DAG serialization**
|
|
|
|
|
|
If Dag Serialization is turned on, Webserver doesn't parse the DAG file, but instead it reads the serialized DAGs in JSON, de-serializes them and creates the DagBag, and shows the DagBag in the UI. It means that Web Server doesn't execute the code inside the file anymore (e.g. opening config files or instantiating objects).
|
|
|
|
|
|
**Caching**
|
|
|
|
|
|
Use caching tools to cache responses from Schema and Search service to prevent requesting the same data.
|
... | ... | |