AWS M15: 500 error when calling manifest ingestion by reference
Summary:
After uploading the manifest to dataset service, airflow shows 500 error as observed in the log below upon triggering the ingestion by reference workflow. The log appears to show the dag calling the old/deprecated endpoint getRetrievalInstructions
instead of retrievalInstructions
. https://community.opengroup.org/osdu/platform/system/dataset/-/blob/master/dataset-core/src/main/java/org/opengroup/osdu/dataset/dms/DmsRestService.java#L71
Steps:
- Using collection from here (With AWS variables)
- Get storage instruction
POST https://r3m15.preshiptesting.osdu.aws/api/dataset/v1/storageInstructions?kindSubType=dataset--File.Generic
- Upload manifest Records_AWS_1000.json
- Trigger workflow
POST https://r3m15.preshiptesting.osdu.aws/api/workflow/v1/workflow/Osdu_ingest_by_reference/workflowRun
Expected Behavior:
Ingestion to be successful.
Observed Behavior:
Error in log:
500 Server Error: Internal Server Error for url: http://os-dataset.osdu-services:8080/api/dataset/v1/getRetrievalInstructions?id=osdu%3Adataset--File.Generic%3A1bf48994af174b2091238a6675be30de
Full log:
*** Reading remote log from s3://r3m15-561735291427-us-west-2-airflow/logs/Osdu_ingest_by_reference/validate_manifest_schema_task/2022-12-21T15:55:26.029451+00:00/1.log.
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: Osdu_ingest_by_reference.validate_manifest_schema_task 9df7950d-8c80-4446-9a4b-75ab5fc9bcb1 [queued]>
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: Osdu_ingest_by_reference.validate_manifest_schema_task 9df7950d-8c80-4446-9a4b-75ab5fc9bcb1 [queued]>
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1238} INFO -
--------------------------------------------------------------------------------
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1239} INFO - Starting attempt 1 of 1
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1240} INFO -
--------------------------------------------------------------------------------
[2022-12-21, 15:55:41 UTC] {taskinstance.py:1259} INFO - Executing <Task(ValidateManifestSchemaOperatorByReference): validate_manifest_schema_task> on 2022-12-21 15:55:26.029451+00:00
[2022-12-21, 15:55:41 UTC] {standard_task_runner.py:52} INFO - Started process 1596 to run task
[2022-12-21, 15:55:41 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'Osdu_ingest_by_reference', 'validate_manifest_schema_task', '9df7950d-8c80-4446-9a4b-75ab5fc9bcb1', '--job-id', '3780', '--raw', '--subdir', 'DAGS_FOLDER/osdu-ingest-r3-by-reference.py', '--cfg-path', '/tmp/tmped34y4y_', '--error-file', '/tmp/tmp21d7z4h5']
[2022-12-21, 15:55:41 UTC] {standard_task_runner.py:77} INFO - Job 3780: Subtask validate_manifest_schema_task
[2022-12-21, 15:55:41 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: Osdu_ingest_by_reference.validate_manifest_schema_task 9df7950d-8c80-4446-9a4b-75ab5fc9bcb1 [running]> on host airflow-worker-96f8cfff5-w2pfv
[2022-12-21, 15:55:42 UTC] {taskinstance.py:1426} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=Osdu_ingest_by_reference
AIRFLOW_CTX_TASK_ID=validate_manifest_schema_task
AIRFLOW_CTX_EXECUTION_DATE=2022-12-21T15:55:26.029451+00:00
AIRFLOW_CTX_DAG_RUN_ID=9df7950d-8c80-4446-9a4b-75ab5fc9bcb1
[2022-12-21, 15:55:52 UTC] {variable.py:246} WARNING - The variable core__auth__access_token is defined in the EnvironmentVariablesBackend secrets backend, which takes precedence over reading from the database. The value in the database will be updated, but to read it you have to delete the conflicting variable from EnvironmentVariablesBackend
[2022-12-21, 15:55:52 UTC] {authorization.py:137} ERROR - {"code":500,"reason":"Internal Server Error","message":"Unrecognized field \"status\" (class org.opengroup.osdu.core.common.dms.model.RetrievalInstructionsResponse), not marked as ignorable (2 known properties: \"datasets\", \"providerKey\"])_ at [Source: (String)\"{\"status\":\"BAD_REQUEST\",\"message\":\"UnrecognizedPropertyException: Unrecognized field \\\"PreLoadFilePath\\\" (class org.opengroup.osdu.file.model.filemetadata.filedetails.FileSourceInfo), not marked as ignorable (12 known properties: \\\"Checksum\\\", \\\"FileSource\\\", \\\"preloadFilePath\\\", \\\"Name\\\", \\\"PreloadFileCreateUser\\\", \\\"PreloadFileModifyDate\\\", \\\"PreloadFileCreateDate\\\", \\\"PreloadFilePath\\\", \\\"FileSize\\\", \\\"EncodingFormatTypeID\\\", \\\"PreloadFileModifyUser\\\", \\\"ChecksumAlgorithm\\\"])\\n at [Source: UN\"[truncated 243 chars]; line: 1, column: 12] (through reference chain: org.opengroup.osdu.core.common.dms.model.RetrievalInstructionsResponse[\"status\"])"}
[2022-12-21, 15:55:52 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/osdu_airflow/operators/validate_manifest_schema_by_reference.py", line 114, in execute
logger=logger)
File "/usr/local/lib/python3.7/site-packages/osdu_airflow/operators/mixins/ReceivingContextMixin.py", line 103, in _get_manifest_data_by_reference
retrieval = dataset_dms_client.get_retrieval_instructions(record_id=record_id)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/dataset/dataset_dms_client.py", line 56, in get_retrieval_instructions
return self._get_instructions('/getRetrievalInstructions', {'id': record_id}, bearer_token)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/dataset/dataset_dms_client.py", line 42, in _get_instructions
params=params, bearer_token=bearer_token)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/base_client.py", line 217, in make_request
response = self._send_request_with_token_refresher(headers, method, url, data, params)
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 164, in _wrapper
**kwargs)
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 138, in send_request_with_auth_header
raise e
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 135, in send_request_with_auth_header
response.raise_for_status()
File "/home/airflow/.local/lib/python3.7/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://os-dataset.osdu-services:8080/api/dataset/v1/getRetrievalInstructions?id=osdu%3Adataset--File.Generic%3A1bf48994af174b2091238a6675be30de
[2022-12-21, 15:55:52 UTC] {taskinstance.py:1277} INFO - Marking task as FAILED. dag_id=Osdu_ingest_by_reference, task_id=validate_manifest_schema_task, execution_date=20221221T155526, start_date=20221221T155541, end_date=20221221T155552
[2022-12-21, 15:55:52 UTC] {standard_task_runner.py:92} ERROR - Failed to execute job 3780 for task validate_manifest_schema_task
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 298, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task
error_file=args.error_file,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/osdu_airflow/operators/validate_manifest_schema_by_reference.py", line 114, in execute
logger=logger)
File "/usr/local/lib/python3.7/site-packages/osdu_airflow/operators/mixins/ReceivingContextMixin.py", line 103, in _get_manifest_data_by_reference
retrieval = dataset_dms_client.get_retrieval_instructions(record_id=record_id)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/dataset/dataset_dms_client.py", line 56, in get_retrieval_instructions
return self._get_instructions('/getRetrievalInstructions', {'id': record_id}, bearer_token)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/dataset/dataset_dms_client.py", line 42, in _get_instructions
params=params, bearer_token=bearer_token)
File "/usr/local/lib/python3.7/site-packages/osdu_api/clients/base_client.py", line 217, in make_request
response = self._send_request_with_token_refresher(headers, method, url, data, params)
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 164, in _wrapper
**kwargs)
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 138, in send_request_with_auth_header
raise e
File "/usr/local/lib/python3.7/site-packages/osdu_api/auth/authorization.py", line 135, in send_request_with_auth_header
response.raise_for_status()
File "/home/airflow/.local/lib/python3.7/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://os-dataset.osdu-services:8080/api/dataset/v1/getRetrievalInstructions?id=osdu%3Adataset--File.Generic%3A1bf48994af174b2091238a6675be30de
[2022-12-21, 15:55:52 UTC] {local_task_job.py:154} INFO - Task exited with return code 1
[2022-12-21, 15:55:52 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check