Fetch-and-ingest should take into account the deletion of data at source
Currently Fetch-and-ingest takes into account newly created or modified (since last successful run) “catalog” records at Source. It doesnt consider the deletion of data at source. This is usecase:
- Provider has file X which has metadata JSON file X’ (both on provider’s instance).
- Now EDS Fetch and Ingest runs and X’ is copied over to operator’s OSDU instance. Let us call it X’’ which has an external Connected Source pointing to X.
- Provider deletes X.
- Fetch and ingest runs but does not delete X since there is no timestamp change. There is no modified date since the whole file is deleted. To ensure operator and provider files are in sync, a full diff needs to be run.
- Therefore, operator is still able to search X’’.
- Operator uses EDS DMS API to access the external record. The external record returns the signed URL for X that does not exist and results in an error.
In this case , it will be great if the metadata at operator side is also deleted.