POST files/metadata Re-try Failure due to Staging File being Deleted Pre-maturely

An issue was observed in POST files/metadata retries during MSFT use of this API in M12: retries performed for a failed POST files/metadata API are likely to result in 400s errors no matter how many times retry is performed. Further investigation shows the root cause is that when metadata creation failed, staging file is also deleted. Thus subsequent retries with the same source file ID that mapped to the deleted staging file will result in failure. The staging file should not be deleted pre-maturely if metadata creation failed.

Current workaround is to perform the extra two steps to upload the file to staging again and then retry POST files/metadata:

Get a signed URL by calling File location API
Upload File to blob storage using signed url
Create the metadata using POST Metadata API

Suggested fix:

In FileMetadataService::saveMetadata, move the deleteStagingFile step to the last step right before successful return. So that staging file will only be deleted when everything succeeds.
Check staging file existence before deleting. Catch and ignore exceptions thrown from staging file delete. Staging file deletion failure is very rare but could happen under special concurrency situations: simultaneous calls for Metadata create with same payload results to one of the delete failure because file already deleted by the other caller. Failed staging file deletion should not invalidate successful metadata creation.

Edited Feb 06, 2023 by Lucy Liu