ADR: Calculate Checksum before saving metadata
Decision Title
Calculate checksum of uploaded file before creating its metadata
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
We support dataset--File.Generic entity record to be created in data platform while user hits /metadata endpoint of File Service. this schema has couple of useful attribute which we don't use as of now which is checksum and checksum algorigthm. These attributes would be super useful to detect any duplicate file uploads in data platform.
Mechanism for calculating checksum
I propose to implement new method in core module (lets say generateChecksum()) which can be implemented by every CSPs in provider module before we make call to storage service for saving metadata of file.
Now this method can be implemented in various ways and algorithms as per CSPs choice, for e.g., in Azure, we really don't need to generate checksum explicitly as its been calculated by blob store automatically, so implementation of generateChecksum() will be to just fetch the blob's metadata and they are done. similarly it can be implemented by other providers if there storage solution also supports calculating checksum while storing blob.
Decision
We should generate checksum of single file before creating its metadata in data platform, so that we can provide that checksum value in metadata record (instance of dataset--File.Generic entity)