Dataset of type file should be saved using file service API instead of storage service
When using a manifest with WP, WPC and Dataset, instead of preloadedFilePath we can use FileSource. Below are steps I followed to create a manifest using FileSource by uploading the file to file service.
- Generate a signed URL using file service URL
- Upload the contents of the file using signed URL
- Refer the FileSource received from get signed URL in manifest
- Trigger manifest ingestion
- Get the metadata of the file uploaded
- Generated signed url to download the file content
I got the URL but when trying to access the file it is failing with blob not found exception. The reason being manifest ingestion dag creates the metadata through storage service which doesn't copy the file from staging area to persistent area.
The manifest ingestion DAG must use file service to upload Dataset of type File so that the file gets copied to persistent area.
We must also explore if the Manifest Ingestion should facilitate source data movement or if uploading data to the OSDU persistent zone is a pre-load activity. Given the robustness of the Dataset Service, it is not realistic to expect the Manifest Ingestion workflow to handle all dataset types. There is also the consideration of whether Manifest Ingestion should handle a subset of Dataset types (i.e., files and file collections).