EDS bulk ingestion does not honor OnIngestionLegalTags
Problem
The eds_ingest DAG triggers Osdu_ingest_by_reference DAG when manifest size is greater than 12 MB, so the Osdu_ingest_by_reference DAG is used for bulk ingestion. The issue arises when this workflow does not honor OnIngestionLegalTags in the provided CSDJ, and pass on "{partition-id}-demo-legaltag" as legaltag. Due to this, storage service returns 400 as "{data_partition_id}-demo-legaltag" doesn't exist in the given environment. Hence, the workflow cannot be completed.
This issue is validated in 2 separate environments.
Reproduction Steps
- Create CSRE
- Create CSDJ -- make sure the query in CSDJ fetches more than 12 MB of data
- Trigger eds_ingest with CSDJ from step 2
- Check the eds_ingest log -- it triggers Osdu_ingest_by_reference DAG instead of Osdu_ingest DAG
- Check the log of Osdu_ingest_by_reference DAG log that is triggered by step 4 -- it will contain a log stating
400 Client Error: Bad Request for url: {dataseturl}/api/dataset/v1/registerDataset - Validate that the PUT /registerDataset API internally calls storage service, and storage service internally calls legal service. The legal service returns 404 Not Found Exception on {data-partition-id}-demo-legaltag. The storage converts it to 400 bad request. The dataset displays this 400 in the Osdu_ingest_by_reference DAG log.
Note: "{data_partition_id}-demo-legaltag" is used as an example here. The actual value of the data-partition-id cannot be shared and is confidential.