Need to bypass checksum generation for file size more than 5 gbs
While user makes a call to POST /metadata api endpoint for registering file on data platform, before saving that as record, file service generates checksum of file provided in request to help duplicate detection for further downstream workflows.
As this is HTTP blocking call checksum calculation takes quite a long if file size is huge (like more than 3-5 gbs) and HTTP post call gets hang and never respond.
We have tested checksum generation and metadata registration takes about 2 mins for file size of 5 GB.
We have experienced this when one of the user tried uploading file size of 14 GBs.
Though percentage of such a huge file being uploaded is quite low we still need to allow them to register metadata and to enable that we must bypass checksum generation logic for such a huge file sizes.
By doing this we still enable duplicate detection ability (by calculating and saving file checksum in storage record) for majority of files uploaded like for 95% of the files and we ignore that for 5% of the request.
Also to enable this for rest of the 5% of file requests we can think of async way to calculate checksum and update the storage record later.