[Intermittent] Record Metadata is available in Cosmos but the Blob store returns a 404.
If record metadata exist and the actual record doesn't exist in BlobStore, FetchBatchRecords API is going to return a 500 with following response
{
"code": 500,
"reason": "Unable to process parallel blob download",
"message": "AppException(error=AppError(code=404, reason=Specified blob was not found, message=Status code 404, \"<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist._RequestId:580b9915-f01e-0009-2c0a-3c65a8000000_Time:2021-04-28T08:45:41.2917696Z</Message></Error>\", errors=null, debuggingInfo=null, originalException=com.azure.storage.blob.models.BlobStorageException: Status code 404, \"<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist._RequestId:580b9915-f01e-0009-2c0a-3c65a8000000_Time:2021-04-28T08:45:41.2917696Z</Message></Error>\"), originalException=com.azure.storage.blob.models.BlobStorageException: Status code 404, \"<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist._RequestId:580b9915-f01e-0009-2c0a-3c65a8000000_Time:2021-04-28T08:45:41.2917696Z</Message></Error>\")"
}
Couple of issues to investigate / fix
- The PersistentServiceImpl ensures that if the blob write has failed, the cosmos db update will not happen. How did we run into this inconsistency.
- If one blob does not exist, the entire FetchBatchRecords call should not fail with a 500.
- Error message for 5xx should always be standard. So, a 500 in this case should be Internal Server Error.