Storage PUT /records lost update
The issue occurs in storage service when trying to update the same record (with the same id) using multiple asynchronous requests at the same time. As a result, only one version is saved in the database and the others are lost.
For example, suppose we call the storage PUT API with three asynchronous requests for the same record. Even though the storage returns 201 with version for each of the requests, calling /records/{id}/{version} with the three created versions results in two 404s and only one 200. All three versions are saved in the blob storage, but "gcsVersionPaths" array of the record in the database has only one new version.
Looking at the code, it appears that this is a lost update problem. When updating a record, the storage fetches the record from the database, performs certain manipulations on it, and then saves it in the database. So when multiple threads are running at the same time, they simultaneously fetch the same record (with the same "gcsVersionPaths" array), add a new version to the array, and save the record in the database. And each thread overwrites the newly added version by the previous thread, resulting in only one version being saved by the last thread executed.
Possible solution: Implement optimistic locking for PUT API. To implement optimistic locking, we can add an additional field to the database record that is updated together with the record. So we fetch the record along with this field and when saving it, we check whether the value of the field has changed, if so, we abort the changes. I'm assuming all provider databases have this functionality built in. For example, in Azure CosmosDB, every item stored in the database has a system-defined property "_etag", and to enable optimistic locking we can pass parameters when saving the record.