Enable use of bulk data file (ex: SegY, DLIS..) from separate cloud location
Bulk-data-in-OSDU-Data-Platform.pptx
Use case - Bulk data file (ex: SegY, DLIS...) is already in cloud storage. Consumer of OSDU Data Platform would like to use the file "as is" and avoid having to make second copy into OSDU Data Platform so as to have to pay for storage two times.
Inside Dataset definition, they could provide the file location in FileSource and leave PreloadFilepath blank.
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
Details -
"FileSource": "https://urldefense.com/v3/__https://storage.cloud.google.com/osdu-gcp-data-test-team-a/Debasis/GCP-test.txt?_ga=2.203312081.-1348921297.1621644289__;!!GF_29dbcQIUBPA!joKOqEcZhHkorJg9XmmBMxvb0BARE7a8-ayFsXfqQfwJEgFyVgdUd_Qo8-daJJ4FUKM$ [storage[.]cloud[.]google[.]com]",
"PreloadFilePath": ""
I tested this scenario in GCP Pre-ship environment. Record creation went smooth (I used manifest-based Ingestion). However, due to current issue of Dataset services, I could not use getRetrievalInstructions and retrieve the file. cc - @Kateryna_Kurach (for information)
During discussion in Ingestion weekly meeting, I was told that Microsoft Azure implementation will not allow this option. cc - @kibattul (for information)
Unsure of AWS and IBM.