Question: Algorithm for processing surrogate-keys

Version 1.0.0 of the Manifest Schema supports the surrogate-keys within the Work Product, Work Product Components, and Files AbstractAny* constructs. Work Product Components can have complex relationships (e.g., Well Log - see the ERD diagram at the bottom). Our understanding is that the surrogate-key construct could exist at any level of the Work Product, Work Product Component, and File level. Resolving these surrogate-keys therefore could be quite complex.

Does the Data Definitions team have a prescribed algorithm for efficiently determining the correct order to write the data presented in the manifest where surrogate-keys are used? We must write data in the correct order to ensure the ids are created by the Storage API and then properly updated in the data where surrogate-key references exist.

There is a brute force approach where the Work Product, Work Product Components, and Files in the manifest are recursively searched to determine where surrogate-keys are leveraged. However, is there an algorithm for determining the correct order to store the data? We could treat the data like a tree and write it from the root to the leaves, but where relationships exist across multiple levels, this becomes difficult to resolve (think reference data).

Edited Jan 05, 2021 by Alan Henson