WellLog WPC not coming over during EDS Connected Source Job but shows as Successful
I'm running into an issue testing the EDS capability for well logs with an LAS file attached to it. The eds_ingest DAG showed a successful completion, but my WPC never shows up on the target (Azure) system. The Airflow log shows that it kicked off an Osdu_Ingest DAG and after tracing that DAG, I was able to narrow the problem down to this snippet during the Osdu_Ingest flow:
[2023-06-12, 22:11:20 UTC] {validate_schema.py:318} ERROR - Schema validation error. Data field. [2023-06-12, 22:11:20 UTC] {validate_schema.py:319} ERROR - Manifest kind: osdu:wks:work-product-component--WellLog:1.0.0 [2023-06-12, 22:11:20 UTC] {validate_schema.py:320} ERROR - Error: 'opendes:dataset--File.Generic:8db972c2c2924d8094599ed30143ecde' does not match '^(surrogate-key:.+|[\w\-\.]+:dataset\-\-[\w\-\.]+:[\w\-\.\:\%]+:[0-9]*)$'
Failed validating 'pattern' in schema['properties']['data']['allOf'][1]['properties']['Datasets']['items']: {'pattern': '^(surrogate-key:.+|[\w\-\.]+:dataset\-\-[\w\-\.]+:[\w\-\.\:\%]+:[0-9]*)$', 'type': 'string', 'x-osdu-relationship': [{'GroupType': 'dataset'}]}
On instance['data']['Datasets'][0]: 'opendes:dataset--File.Generic:8db972c2c2924d8094599ed30143ecde' [2023-06-12, 22:11:20 UTC] {taskinstance.py:1272} INFO - Marking task as SUCCESS. dag_id=Osdu_ingest, task_id=validate_manifest_schema_task, execution_date=20230612T221108, start_date=20230612T221118, end_date=20230612T221120 [2023-06-12, 22:11:20 UTC] {local_task_job.py:154} INFO - Task exited with return code 0 [2023-06-12, 22:11:21 UTC] {local_task_job.py:264} INFO - 1 downstream tasks scheduled from follow-on schedule check
Here is a direct link to the AirFlow log containing the snippet: https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=Osdu_ingest&task_id=validate_manifest_schema_task&execution_date=2023-06-12T22%3A11%3A08.415942%2B00%3A00
The initial eds_ingest DAG Airflow log that kicked off everything is located here: https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-06-12T22%3A11%3A00.001000%2B00%3A00
The id of my connectedSourceDataJob is : opendes:master-data--ConnectedSourceDataJob:mosley_cvxtest_welllogs
It seems to indicate that my WellLog WPC record has a value in the Datasets array that doesn't match the regex pattern, although I believe the value does follow the regex pattern defined in the WellLog WPC schema.