ADR: File Service design
Decision Title
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
Files are one of the source of data for ingestion. In OSDU R2, there is a file service (not tagged as part of OSDU R2 release) that helps with signed location to upload files and there is also a delivery service that helps with providing signed URLs to download the file.
What is missing today is more of holistic management of file as an entity. Management of file as an entity could include
- Managing metadata related to files. Managing would mean all needed CRUD operations on the metadata of a file.
- Type safe way of managing the file metadata.
- Discover ability of files based on metadata.
- Enabling file-based downstream Ingestion and enrichment workflows.
- Secure access controlled downloading of file identified by metadata record id.
- One service (consolidation) that handles all the functionalities (uploading, downloading, discovery) for file type data.
In-Scope:
Retaining the existing upload functionality in file service. Introduction of new APIs for posting metadata for a single file and retrieve metadata using the metadata record id. Rationalizing file service by bringing in download capabilities from delivery service into file service.
Out of Scope:
Supporting various use cases of uploads and downloads like folder, batch of files etc are out of scope of this ADR. These all the valid use cases and file service can be enhanced in incremental fashion for same.
Decision
The decision is to consolidate or rationalize all the file management functionalities that exist in OSDU R2 in various services into a single service (File Service). Along with that enable Multi partition support, access control, and compliance on the file data.
Rationale
File service would be more complete in terms of managing file as an entity. It would be formed by leveraging capabilities of all existing services. This would help us re-use what exists and extend the capabilities.
Consequences
Single service will be responsible for managing all aspects of a file. It would provide capabilities to upload, download and manage metadata for files. There are some modifications on existing functionalities that are proposed below
Functionality | API | Status | Capability |
---|---|---|---|
Download | /files/{id}/downloadURL | New | Get signed URL for download based on metadata Id |
Upload | /files/uploadURL | New | Get signed upload location. Partition aware. |
Metadata Management | /files/metadata | New | Post Medata for the file |
/files/{id}/metadata | New | Get Metadata record by id |
The posting of metadata workflow would be
The workflow ensures movement of files from a landing zone to a persistent zone. These zones are partition specific. These zones can be seen as
- Landing Zone - Area which user can user to stage/land their files.
- Persistent Zone - Area under control of data platform, enabling secure delivery of discoverable files.
API Spec
The API spec of enhanced file service can be found at - https://community.opengroup.org/osdu/platform/system/file/-/blob/fileservice-api-enhancement/docs/file-service_openapi.yaml (fileservice-api-enhancement branch)
Tradeoff Analysis - Input to decision
-
Need of a holistic file management service
-
Existing functionality is only for upload and download and in different services.
-
Type safe way of ingesting file metadata into the system.
-
Integration of file ingestion with other downstream workflows.