Storage service stale in-memory cache leads to inconsistency.
We recently uncovered a bug in storage service due to local cache getting stale. The flow can be understood by the following steps.
- Deletion of a legal tag via legal service delete API --> response 204 No content after successful deletion
- Storage service API call made at https://*******/api/storage/v2/push-handlers/legaltag-changed?token= --> Goes to a pod P1 of storage service --> Updates the records compliance for all the record associated with the deleted tag in step 1---> Removes the deleted tag from local cache of pod P1.
- Storage PUT call to create a record with the deleted legal tag--> goes to a pod P2 of storage--> the cache still has that legal tag-->returns 201 created.
At step 3, all calls going to pod p1 returns "Invalid legal tag" but API calls landing on other pods successfully create these records. The service ITs are failing in transient manner due to this issue.
Edited by Nikhil Singh[MicroSoft]