[M20] EDS Naturalization - fails for multiple WPC records
DAG triggered manually using Airflow web UI. Input:
{
"execution_context":{
"id":[
"osdu:work-product-component--WellLog:MBtestEDS-63e2657a282e4d26b9e4817765287d06",
"osdu:work-product-component--WellLog:MBtestEDS-93d0660cbff145c3b0df8940a2c36518",
"osdu:work-product-component--WellLog:MBtestEDS-0541b241070c417685184355bbca26f5",
"osdu:work-product-component--WellLog:MBtestEDS-19d1c04abbea4236af3509cd55af118a",
"osdu:work-product-component--WellLog:MBtestEDS-7e6a1fb41cf84ecb903192e46bd30977"
]
},
"runId":"1234"
}
After investigating logs in airflow web UI, error in step: update_status_finished_task was found:
[2023-11-03, 12:06:43 UTC] {{update_wpc_records.py:34}} INFO - In Update WPC Record --Start
[2023-11-03, 12:06:44 UTC] {{update_wpc_records.py:44}} INFO - b'{"code":400,"reason":"Bad request","message":"Cannot update the same record multiple times in the same request. Id: osdu:work-product-component--WellLog:MBtestEDS-63e2657a282e4d26b9e4817765287d06"}'
[2023-11-03, 12:06:44 UTC] {{update_wpc_records.py:45}} INFO - 400
[2023-11-03, 12:06:44 UTC] {{python.py:177}} INFO - Done. Returned value was: None
At the sime time, output from step generate_batches_task does not reveal any duplicates:
[['osdu:work-product-component--WellLog:MBtestEDS-63e2657a282e4d26b9e4817765287d06', 'osdu:work-product-component--WellLog:MBtestEDS-93d0660cbff145c3b0df8940a2c36518'], ['osdu:work-product-component--WellLog:MBtestEDS-0541b241070c417685184355bbca26f5'], ['osdu:work-product-component--WellLog:MBtestEDS-19d1c04abbea4236af3509cd55af118a'], ['osdu:work-product-component--WellLog:MBtestEDS-7e6a1fb41cf84ecb903192e46bd30977']]
Call to OSDU storage service shows that datasets were not naturalized
{
"id": "osdu:work-product-component--WellLog:MBtestEDS-7e6a1fb41cf84ecb903192e46bd30977",
"data": {
"Datasets": [
"osdu:dataset--ConnectedSource.Generic:EDS-test-7e6a1fb41cf84ecb903192e46bd30977:"
],
...
After our internal investigation and debugging it seems like this issue is relataed to batching records into parallel runs, where one record seems to be included in 2 batches.