Improve Airflow logs
The problem that we are facing now is that it is hard to read Airflow logs and hard to see what records were stored into Storage service and what records were not. Several comments here:
- DAG execution status may be green, but some of the records were not stored. This is somewhat expected behavior, thats why DAG is displayed green in the Airflow. We expect that some of the entities may fail validation. If they fail validation, we skip them and process other entities. It may be confusing to the user.
- We have lots of tasks in osdu_ingest DAG now and validation happen at different stages, so logs are spread out between different tasks. -> it is hard for a user to know what log to check
- Sometimes Airflow logs don't even display skipped ids. This is a critical issue that has to be fixed.
Ideally, it would be great to produce a report at the end of DAG execution. Report should list processed ids and unprocessed ids with the errors.