Ingestion By Reference in RI (baremetal/anthos) implementation is not ingesting the data.
Ingestion By Reference ( Osdu_ingest_by_reference) in RI (baremetal/anthos) implementation is not ingesting the data. I am able to upload and download the JSON file used for ingestion. The airflow is returning the status "finished"/Green color, but the records are not getting ingested. Upon looking at the airflow logs, one sees the following messages
airflow log
osdu-ingest-by-reference-update-status-finished-task-a6jlna1m *** Found logs in s3: *** * s3://airflow-log/logs/dag_id=Osdu_ingest_by_reference/run_id=f7882867-d918-41f1-b047-f4c21be6c00c/task_id=update_status_finished_task/attempt=1.log [2023-11-13, 19:03:06 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti= [2023-11-13, 19:03:07 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti= [2023-11-13, 19:03:07 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 1 [2023-11-13, 19:03:07 UTC] {taskinstance.py:1327} INFO - Executing on 2023-11-13 19:02:09.292958+00:00 [2023-11-13, 19:03:07 UTC] {standard_task_runner.py:57} INFO - Started process 17 to run task [2023-11-13, 19:03:07 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'Osdu_ingest_by_reference', 'update_status_finished_task', 'f7882867-d918-41f1-b047-f4c21be6c00c', '--job-id', '6044', '--raw', '--subdir', 'DAGS_FOLDER/external/osdu-ingest-r3-by-reference.py', '--cfg-path', '/tmp/tmpop0igj0q'] [2023-11-13, 19:03:07 UTC] {standard_task_runner.py:85} INFO - Job 6044: Subtask update_status_finished_task [2023-11-13, 19:03:07 UTC] {task_command.py:410} INFO - Running on host osdu-ingest-by-reference-update-status-finished-task-a6jlna1m [2023-11-13, 19:03:07 UTC] {pod_generator.py:529} WARNING - Model file does not exist [2023-11-13, 19:03:07 UTC] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='Osdu_ingest_by_reference' AIRFLOW_CTX_TASK_ID='update_status_finished_task' AIRFLOW_CTX_EXECUTION_DATE='2023-11-13T19:02:09.292958+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='f7882867-d918-41f1-b047-f4c21be6c00c' [2023-11-13, 19:03:07 UTC] {update_status_by_reference.py:75} INFO - There are successed tasks before this one. So it has status SUCCESSED [2023-11-13, 19:03:07 UTC] {logging_mixin.py:149} INFO - user_id in Context Initialization is None [2023-11-13, 19:03:08 UTC] {logging_mixin.py:149} WARNING - /opt/bitnami/airflow/venv/lib/python3.9/site-packages/urllib3/connectionpool.py:1045 InsecureRequestWarning: Unverified HTTPS request is being made to host 's3.bm21.gcp.gnrg-osdu.projects.epam.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings [2023-11-13, 19:03:08 UTC] {update_status_by_reference.py:201} **ERROR - #SKIPPED_IDS: Some ids in the manifest were skipped**. You can find the report in the datasetService with this record id : osdu:dataset--File.Generic:ea9c9e40f8474a57952fd8df4870ad64 [2023-11-13, 19:03:08 UTC] {taskinstance.py:1345} INFO - Marking task as SUCCESS. dag_id=Osdu_ingest_by_reference, task_id=update_status_finished_task, execution_date=20231113T190209, start_date=20231113T190306, end_date=20231113T190308 [2023-11-13, 19:03:08 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 0 [2023-11-13, 19:03:08 UTC] {taskinstance.py:2651} INFO - 0 downstream tasks scheduled from follow-on schedule checkUpload the file
curl --location --request PUT 'https://s3.bm21.gcp.gnrg-osdu.projects.epam.com/refi-osdu-staging-area/d89cb375-6ce1-48d6-8b2c-681ee8b2c776/3d578febbd01444e94e208b09dbc3722?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=fileUser%2F20231113%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231113T190106Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=7fedf88d44eec464d3b9348c62ae07300be7136d70501cca021f199416861757' \ --header 'x-ms-blob-type: BlockBlob' \ --header 'data-partition-id: osdu' \ --header 'Content-Type: application/json' \ --data '@anthos_IngestByRefTest_2Master_records.json'Response 200 OK
Request
curl --location 'https://osdu.bm21.gcp.gnrg-osdu.projects.epam.com/api/workflow/v1/workflow/Osdu_ingest_by_reference/workflowRun' \ --header 'data-partition-id: osdu' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer eyJhbGciOi...Truncated...iw2woo0P53Q' \ --data '{ "executionContext": { "Payload": { "AppKey": "test-app", "data-partition-id": "osdu" }, "manifest": "osdu:dataset--File.Generic:dbd47f02fa1a4ab3b48ede6777406840" } }'Response 200 OK { "workflowId": "09b47b8a-b0e1-4c08-8742-c3eba971d203", "runId": "f7882867-d918-41f1-b047-f4c21be6c00c", "startTimeStamp": 1699902128741, "status": "submitted", "submittedBy": "osdu-tester@service.local" }
Check the ingestion status
curl --location 'https://osdu.bm21.gcp.gnrg-osdu.projects.epam.com/api/workflow/v1/workflow/Osdu_ingest_by_reference/workflowRun/f7882867-d918-41f1-b047-f4c21be6c00c' \ --header 'Data-Partition-Id: osdu' \ --header 'Authorization: Bearer eyJhbGciOi...Truncated...iw2woo0P53Q' \ --data ''Response 200 OK { "workflowId": "09b47b8a-b0e1-4c08-8742-c3eba971d203", "runId": "f7882867-d918-41f1-b047-f4c21be6c00c", "startTimeStamp": 1699902128741, "endTimeStamp": 1699902187583, "status": "finished", "submittedBy": "osdu-tester@service.local" }
Search the record
curl --location 'https://osdu.bm21.gcp.gnrg-osdu.projects.epam.com/api/storage/v2/records/osdu:master-data--Well:0' \ --header 'Content-Type: application/json' \ --header 'data-partition-id: osdu' \ --header 'Authorization: Bearer eyJhbGciOi...Truncated...iw2woo0P53Q' \ --data ''Response 404 Not Found { "code": 404, "reason": "Record not found", "message": "The record 'osdu:master-data--Well:0' was not found" }