Question: Algorithm for processing surrogate-keys
Version 1.0.0 of the Manifest Schema supports the surrogate-key
s within the Work Product, Work Product Components, and Files AbstractAny*
constructs. Work Product Components can have complex relationships (e.g., Well Log - see the ERD diagram at the bottom). Our understanding is that the surrogate-key
construct could exist at any level of the Work Product, Work Product Component, and File level. Resolving these surrogate-key
s therefore could be quite complex.
Does the Data Definitions team have a prescribed algorithm for efficiently determining the correct order to write the data presented in the manifest where surrogate-key
s are used? We must write data in the correct order to ensure the id
s are created by the Storage API and then properly updated in the data where surrogate-key
references exist.
There is a brute force approach where the Work Product, Work Product Components, and Files in the manifest are recursively searched to determine where surrogate-key
s are leveraged. However, is there an algorithm for determining the correct order to store the data? We could treat the data like a tree and write it from the root to the leaves, but where relationships exist across multiple levels, this becomes difficult to resolve (think reference data).