Bug in utils.py method "split_id" prevents WP manifest ingestion
There is a bug in the utils.py method split_id
where it assumes that any number at the end must be a version for the record and must be removed even when the number at the end is not a version but part of the actual record id. This makes it so every single WP manifest that references any master data can't be ingested.
Details:
WP manifests reference master data like this:
{{data-partition-id}}:wks:master-data--Well:1000:
When this gets to the referential integrity step, the method split_id
in utils.py takes this external reference and returns:
{{data-partition-id}}:wks:master-data--Well
This returned value is then passed along to the search service to look for the record's existence. Search returns nothing because that isn't a valid record id it is searching for. Subsequently, the dag logs a warning and never ingests the manifest because it "failed" the referential check.
The split_id
method should return {{data-partition-id}}:wks:master-data--Well:1000
like how it does for reference data records. The problem is found on these lines:
It is assuming that anything that is numbers at the end of a record id must be a version number, ignoring the position of those numbers. This line of code needs to change to allow digits at the end of record ids.
That first if condition you see above should catch this problem since the record id we're passing in has a trailing colon. However, this trailing colon is removed earlier in the process in the method _extract_external_references
If you try to bypass this by removing the colon at the end in the manifest itself, the validation step throws an error and keeps you from ingesting the manifest. The only way pass this for now is to comment out the lines of code I've circled in the image above.