WPC surrogate keys not working in M8
Trying to load the sample WPC data e.g. TNO\output_tno_document\load_document_prov33_pdf.json doesn’t work in M8
First failure is due to schema validation failure:
[2021-09-29 09:14:29,765] {validate_schema.py:290} ERROR - Manifest kind: osdu:wks:work-product-component--Document:1.0.0
[2021-09-29 09:14:29,766] {validate_schema.py:291} ERROR - Error: 'surrogate-key:file-1' does not match '^[\w\-\.]+:dataset\-\-[\w\-\.]+:[\w\-\.\:\%]+:[0-9]*$'
Failed validating 'pattern' in schema['properties']['data']['allOf'][1]['properties']['Datasets']['items']:
{'description': 'The SRN which identifies this OSDU File resource.',
'pattern': '^[\\w\\-\\.]+:dataset\\-\\-[\\w\\-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]*$',
'type': 'string',
'x-osdu-relationship': [{'GroupType': 'dataset'}]}
On instance['data']['Datasets'][0]:
'surrogate-key:file-1'
Correcting the format to something like “surrogate-key:dataset--1:0:0” makes the schema pass validation, but if then seems to be stripped away in the provide_manifest_integrity_check
[2021-09-29 09:05:53,883] {ensure_manifest_integrity.py:70} DEBUG - Manifest data: {'ReferenceData': [], 'MasterData': [], 'kind': 'osdu:wks:Manifest:1.0.0', 'Data': {'WorkProduct': {'data': {'Components': ['surrogate-key:wpc-1'], 'Description': 'Document', 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:', 'Name': 'prov33'}, 'kind': 'osdu:wks:work-product--WorkProduct:1.0.0', 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}, 'Datasets': [{'data': {'DatasetProperties': {'FileSourceInfo': {'FileSource': '/osdu-user/1632906292949-2021-09-29-09-04-52-949/1de301fc52664f49b90b3a9bee81ae48', 'PreloadFilePath': 's3://osdu-seismic-test-data/r1/data/provided/USGS_docs/prov33.pdf', 'Name': 'prov33.pdf'}}, 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:'}, 'kind': 'osdu:wks:dataset--File.Generic:1.0.0', 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'id': 'surrogate-key:dataset--1:0:0', 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}], 'WorkProductComponents': [{'data': {'Datasets': ['surrogate-key:dataset--1:0:0'], 'Description': 'Document', 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:', 'Name': 'prov33'}, 'kind': 'osdu:wks:work-product-component--Document:1.0.0', 'meta': [], 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'id': 'surrogate-key:wpc-1', 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}]}} [2021-09-29 09:05:53,884] {validate_referential_integrity.py:124} DEBUG - WPC: surrogate-key:wpc-1 doesn't have Artefacts field. Mark it as valid. [2021-09-29 09:05:53,884] {search_record_ids.py:78} DEBUG - Search query "opendes:reference-data--ResourceSecurityClassification:RESTRICTED" [2021-09-29 09:05:54,264] {connectionpool.py:230} DEBUG - Starting new HTTP connection (1): search.osdu-azure.svc.cluster.local:80 [2021-09-29 09:05:54,655] {connectionpool.py:442} DEBUG - http://search.osdu-azure.svc.cluster.local:80 "POST /api/search/v2/query HTTP/1.1" 200 None [2021-09-29 09:05:54,656] {search_record_ids.py:183} DEBUG - {"results":[],"aggregations":[],"totalCount":0} [2021-09-29 09:05:54,656] {search_record_ids.py:188} DEBUG - Got total count 0 [2021-09-29 09:05:54,656] {search_record_ids.py:169} DEBUG - response ids: [] [2021-09-29 09:05:54,657] {ensure_manifest_integrity.py:76} DEBUG - Valid manifest data: {'ReferenceData': [], 'MasterData': [], 'kind': 'osdu:wks:Manifest:1.0.0', 'Data': {'WorkProduct': {'data': {'Components': ['surrogate-key:wpc-1'], 'Description': 'Document', 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:', 'Name': 'prov33'}, 'kind': 'osdu:wks:work-product--WorkProduct:1.0.0', 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}, 'Datasets': [{'data': {'DatasetProperties': {'FileSourceInfo': {'FileSource': '/osdu-user/1632906292949-2021-09-29-09-04-52-949/1de301fc52664f49b90b3a9bee81ae48', 'PreloadFilePath': 's3://osdu-seismic-test-data/r1/data/provided/USGS_docs/prov33.pdf', 'Name': 'prov33.pdf'}}, 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:'}, 'kind': 'osdu:wks:dataset--File.Generic:1.0.0', 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'id': 'surrogate-key:dataset--1:0:0', 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}], 'WorkProductComponents': [{'data': {'Datasets': ['surrogate-key:dataset--1:0:0'], 'Description': 'Document', 'ResourceSecurityClassification': 'opendes:reference-data--ResourceSecurityClassification:RESTRICTED:', 'Name': 'prov33'}, 'kind': 'osdu:wks:work-product-component--Document:1.0.0', 'meta': [], 'legal': {'legaltags': ['opendes-public-usa-dataset-7643990'], 'otherRelevantDataCountries': ['US']}, 'id': 'surrogate-key:wpc-1', 'acl': {'viewers': ['data.default.viewers@opendes.contoso.com'], 'owners': ['data.default.owners@opendes.contoso.com']}}]}} [2021-09-29 09:05:55,170] {init.py:62} DEBUG - Backend: None, Lineage called with inlets: [], outlets: [] [2021-09-29 09:05:55,409] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=Osdu_ingest, task_id=provide_manifest_integrity_task, execution_date=20210929T090455, start_date=20210929T090548, end_date=20210929T090555 [2021-09-29 09:05:57,884] {base_job.py:197} DEBUG - [heartbeat] [2021-09-29 09:05:57,884] {local_task_job.py:102} INFO - Task exited with return code 0