[Manifest-based] Deferred integrity check (ELT) and/or Current implementation (ETL)
There is a situation when the Manifest contains entities that have links inside to other pieces of data, these links can refer either to entities inside the Manifest or to already stored on OSDU ones. At the current implementation we check entities’ integrity during Manifest based ingesting, and it can take a lot of time to check every entity’s reference to other ones. Also, there is a problem when the ingested entity doesn’t have the unique id or has the surrogate key, this causes issues with identifying skipped due to inconsistency entities. The solution may be to store entities as they are, get unique OSDU ids, replace surrogate keys with real ids, then start background DAG that will check data consistency of each record. For sure, the mechanism of setting current status of records (consistent, not consistent, not verified) must be invented.
This solution has to be discussed in more details.