Existing dataset id validating is preventing external dataset ID being ingested on EDS
Problem: The DATASET_ID_PATTERN variable in validate_referential_integrity.py is called during ingestion and validates whether external referred source id record exists on OSDU platform if not fails.
Example: The Dataset id received from Katalyst supplier "katalyst:dataset--File.Generic:19330323"
As part of EDS ingestion process, creating Dataset record on consumer side with supplier dataset as source id entry as value in the EDS Dataset record creation. And later on for the proxy service to pick the actual dataset file based on the information available on the Dataset record as below mentioned example.
{ 'data': { 'DatasetProperties': { 'ConnectedSourceRegistryEntryID': 'osdu:master-data--ConnectedSourceRegistryEntry:a9d70013-e645-4d7a-a721-89f88c32807f:', 'ConnectedSourceDataJobID': 'osdu:master-data--ConnectedSourceDataJob:a9d70013-e645-4d7a-a721-89f88c32807f:', 'SourceDataPartitionID': 'katalyst', 'SourceRecordID': 'katalyst:dataset--File.Generic:19330323' } }, 'kind': 'osdu:wks:dataset--ConnectedSource.Generic:0.2.0', 'legal': { 'legaltags': ['osdu-demo-legaltag'], 'otherRelevantDataCountries': ['US'] }, 'acl': { 'owners': ['data.default.owners@osdu.example.com'], 'viewers': ['data.default.viewers@osdu.example.com'] }, 'id': 'osdu:dataset--ConnectedSource.Generic:Katalyst-katalyst-7334715' }
======================================================================================================== Because DATASET_ID_PATTERN validates whether SourceRecordId 'katalyst:dataset--File.Generic:19330323' records exists on consumer platform.
As a workaround when passed SourceRecordId 'katalyst:dataset-File.Generic:19330323' instead of '--' in the source record id the record getting processed successful.
Impact
Manifests generated from EDS workflows fail.
Resolution
Code change is required not to validate for SourceRecordId when comes from EDS