Storage issueshttps://community.opengroup.org/osdu/platform/system/storage/-/issues2021-12-17T20:19:56Zhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/102Log4J Expedient Updates and Patches2021-12-17T20:19:56ZDavid Diederichd.diederich@opengroup.orgLog4J Expedient Updates and PatchesThis issue associates MRs that were applied to this project quickly to get a patched version ready as soon as possible. The intent is to provide a reference point for later, more thoughtful, analysis.This issue associates MRs that were applied to this project quickly to get a patched version ready as soon as possible. The intent is to provide a reference point for later, more thoughtful, analysis.Spencer Suttonsuttonsp@amazon.comSpencer Suttonsuttonsp@amazon.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/103Upgrade to Log4J 2.172021-12-21T02:09:48ZDavid Diederichd.diederich@opengroup.orgUpgrade to Log4J 2.17The Apache Foundation released another Log4j2 update, version 2.17, which address a denial of service vulnerability.
This issue tracks progress to upgrade this dependency for this project.The Apache Foundation released another Log4j2 update, version 2.17, which address a denial of service vulnerability.
This issue tracks progress to upgrade this dependency for this project.https://community.opengroup.org/osdu/platform/system/storage/-/issues/104Add flexible page size option in Storage getRecordsByKind API2022-01-17T14:02:00ZVibhuti Sharma [Microsoft]Add flexible page size option in Storage getRecordsByKind APIGet records by Kind API in storage service returns results in the form of multiple pages in case of large number of records. The page size of these pages is constant - it is equal to the `limit` specified in the optional query parameter ...Get records by Kind API in storage service returns results in the form of multiple pages in case of large number of records. The page size of these pages is constant - it is equal to the `limit` specified in the optional query parameter or the default limit configuration set in config file.
# **Context**
Reindex API in indexer service calls the getRecordsByKind storage API to fetch multiple records. We need an option of querying storage without the constant page size constraint for performance related enhancements on azure provider.
# **Proposed Solution**
Add an **optional** parameter in storage service API, which when set to true will make the API return results with page size <= limit configured. The default behavior will be to return results with page size == limit configured.M10 - Release 0.13Vibhuti Sharma [Microsoft]Vibhuti Sharma [Microsoft]https://community.opengroup.org/osdu/platform/system/storage/-/issues/106Add flexible page size option in Storage getRecordsByKind API2022-03-07T17:11:52ZVibhuti Sharma [Microsoft]Add flexible page size option in Storage getRecordsByKind APIGet records by Kind API in storage service returns results in the form of multiple pages in case of large number of records. The page size of these pages is constant - it is equal to the `limit` specified in the optional query parameter ...Get records by Kind API in storage service returns results in the form of multiple pages in case of large number of records. The page size of these pages is constant - it is equal to the `limit` specified in the optional query parameter or the default limit configuration set in config file.
# **Context**
Reindex API in indexer service calls the getRecordsByKind storage API to fetch multiple records. We need an option of querying storage without the constant page size constraint for performance related enhancements on azure provider.
# **Proposed Solution**
Add an **optional** parameter in storage service API, which when set to true will make the API return results with page size <= limit configured. The default behavior will be to return results with page size == limit configured.M10 - Release 0.13Vibhuti Sharma [Microsoft]Vibhuti Sharma [Microsoft]2022-01-21https://community.opengroup.org/osdu/platform/system/storage/-/issues/107Intermittent record not found errors in Storage batch API2022-11-21T09:50:21ZAn NgoIntermittent record not found errors in Storage batch APIError has been reported on Storage query/records:batch API where the user sometimes is not able to retrieve a few records. The same records could be fetched at a later time. Storage service is responding with record not found error, and ...Error has been reported on Storage query/records:batch API where the user sometimes is not able to retrieve a few records. The same records could be fetched at a later time. Storage service is responding with record not found error, and is impacting 1% of the request.
**Job details to reproduce the error: **
- Number of records: 14K
- Storage batch size: 20
- Number of threads: 10
- Record size: few KBs (well head information)Neelesh ThakurNeelesh Thakurhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/109storage get record with version api returns 5002022-11-21T09:54:03ZNeelesh Thakurstorage get record with version api returns 500Here is a curl where the record exists and I get 200:
```
curl --location --request GET 'https://evt.api.enterprisedata.cloud.slb-ds.com/api/storage/v2/records/opendes%3Awork-product-component--RegularHeightField%3A0a16c55a-aec0-4f21-a5...Here is a curl where the record exists and I get 200:
```
curl --location --request GET 'https://evt.api.enterprisedata.cloud.slb-ds.com/api/storage/v2/records/opendes%3Awork-product-component--RegularHeightField%3A0a16c55a-aec0-4f21-a55b-abd0447d31f6/1637157881569884' \
--header 'accept: application/json' \
--header 'data-partition-id: opendes' \
--header 'Authorization: Bearer ***'
```
failure
Here I changed the last digit of the version:
```
curl --location --request GET 'https://evt.api.enterprisedata.cloud.slb-ds.com/api/storage/v2/records/opendes%3Awork-product-component--RegularHeightField%3A0a16c55a-aec0-4f21-a55b-abd0447d31f6/1637157881569881' \
--header 'accept: application/json' \
--header 'data-partition-id: opendes' \
--header 'Authorization: Bearer ***'
```
response:
```
{
"code": 500,
"reason": "Version not found",
"message": "The version 1637157881569881 can't be found for record opendes:work-product-component--RegularHeightField:0a16c55a-aec0-4f21-a55b-abd0447d31f6"
}
```
Expected result: return code is 404
Actual result: return code is 500https://community.opengroup.org/osdu/platform/system/storage/-/issues/110Issue in Publisher Facade2022-11-21T09:51:31ZAbhishek Kumar (SLB)Issue in Publisher FacadeThe branch is not in a running state due to bug in the azure core library.
Please refer to this issue https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/issues/17
**Branch:** UsageOfPublishFacadeThe branch is not in a running state due to bug in the azure core library.
Please refer to this issue https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/issues/17
**Branch:** UsageOfPublishFacadeNikhil Singh[MicroSoft]Nikhil Singh[MicroSoft]https://community.opengroup.org/osdu/platform/system/storage/-/issues/111Potential defect related to Delete methods due to no response body - Storage ...2022-11-21T10:12:09ZJevon WilliamsPotential defect related to Delete methods due to no response body - Storage SchemaDELETE API Methods that has no response body. This call will provide both a 204 success code and a 404 failure code but it does not provide any details and/or explanations:
1. Example 204 code is returned but it does not say item has su...DELETE API Methods that has no response body. This call will provide both a 204 success code and a 404 failure code but it does not provide any details and/or explanations:
1. Example 204 code is returned but it does not say item has successfully deleted.
1. Example 404 code is returned but it does not say why the error code was returned
URL Endpoint - base_url/api/storage/v2/schemas/osdu:osdu:fault-system-wp:0.2.0
![deleteAPI_Storage_screenshot](/uploads/1f100652e42805d620d8a6ee55e3dc45/deleteAPI_Storage_screenshot.PNG)https://community.opengroup.org/osdu/platform/system/storage/-/issues/112Potential defect related to Delete methods due to no response body - Storage ...2022-11-21T10:12:01ZJevon WilliamsPotential defect related to Delete methods due to no response body - Storage Delete RecordDELETE API Methods that has no response body. This call will provide both a 204 success code and a 404 failure code but it does not provide any details and/or explanations:
Example 204 code is returned but it does not say item has s...DELETE API Methods that has no response body. This call will provide both a 204 success code and a 404 failure code but it does not provide any details and/or explanations:
Example 204 code is returned but it does not say item has successfully deleted.
Example 404 code is returned but it does not say why the error code was returned
URL Endpoint - base_url/api/storage/v2/records/{{recordIds}}![deleteAPI_Storage_Record_screenshot](/uploads/396a120b7291f242bf4a0fac9b1b12ed/deleteAPI_Storage_Record_screenshot.PNG)https://community.opengroup.org/osdu/platform/system/storage/-/issues/113Storage /records endpoint without having Content-type in the header throws 41...2023-03-01T04:46:31ZAn NgoStorage /records endpoint without having Content-type in the header throws 415 errorStorage /records endpoint without having Content-type in the header throws 415 error codeStorage /records endpoint without having Content-type in the header throws 415 error codehttps://community.opengroup.org/osdu/platform/system/storage/-/issues/114No audit log for succeed PATCH updates2022-03-24T14:46:56ZYauheni LesnikauNo audit log for succeed PATCH updatesWhen the PATCH update performs there is audit logging only for failed record ids. We need to add similar logging for the succeed ones as well.When the PATCH update performs there is audit logging only for failed record ids. We need to add similar logging for the succeed ones as well.M11 - Release 0.14Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/116In Azure environment the end point to query the data with limit is not working2022-08-26T11:59:00ZKamlesh TodaiIn Azure environment the end point to query the data with limit is not workingIn the Azure environment, the end point to query the data with a limit is not working.
e.g. https://osdu-ship.msft-osdu-test.org/api/storage/v2/query/kinds?limit=10
Response: 400 Bad Request
{
"code": 400,
"reason": "Limit not suppo...In the Azure environment, the end point to query the data with a limit is not working.
e.g. https://osdu-ship.msft-osdu-test.org/api/storage/v2/query/kinds?limit=10
Response: 400 Bad Request
{
"code": 400,
"reason": "Limit not supported",
"message": "The limit is invalid"
}
@debasisc @sehuboy @kumar_vaibav @ChrisZhangM10 Patch - Release 0.13 patchKrishna Nikhil VedurumudiKrishna Nikhil Vedurumudihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/117Storage fails to delete large number of records upon legal tag expiration2024-03-21T15:19:58ZAn NgoStorage fails to delete large number of records upon legal tag expirationIf there are large number of records associated with a legalTag that expires after running the cron job, we are seeing availability issues and inconsistent result in terms of record searchability.
**Observations:**
**LegalTag cron job...If there are large number of records associated with a legalTag that expires after running the cron job, we are seeing availability issues and inconsistent result in terms of record searchability.
**Observations:**
**LegalTag cron job update issue:**
**Scenario**: I have a large number of records (in the 6 digits) that are associated with a legalTag (i.e. the record metadata has a particular legalTag (let's call it lt1) in the legal.legaltags section). The legalTag lt1 is set to expire soon
**Event**: lt1 expires
**Action 1** : Cron job `updateLegalTagStatus` is triggered on a periodic basis, which grabs the legalTags that have changed their state (valid to invalid and invalid to valid) and publishes this information onto SB topic 'legaltags' and EG topic 'legaltagschangedtopic'. The legalTag also changes its state in the CosmosDb
'legaltagschangedtopic' has an event subscription to SB topic 'legaltagschangedtopiceg', which has a subscription 'eg_sb_legaltagssubscription'
**Action 2 **: Storage service pulls messages from 'eg_sb_legaltagssubscription' for LegalTag update events and updates records associated with lt1. Storage updates the recordMetadata with active/inactive record status and publishes the change onto SB and EG for indexer-queue to consume.
**Expected outcome:** All records associated with lt1 are now inactive. They are unsearchable from Storage and Search APIs.
**Actual outcome:** Some records associated with lt1 are now inactive. They are unsearchable from Storage and Search APIs. I can still search other records.
**Issue**: Not all records are getting pulled from Storage service at **Action 2** to be processed. Thus, many records simply don't change their state, although the legalTag is invalid now.
**Observed behavior/possible improvements:**
1. The context of legalTag change (active to inactive or inactive to active) is not considered by Storage when fetching records to update. Storage tries to fetch ALL records for that legalTag with the query
SELECT * FROM c WHERE ARRAY_CONTAINS(c.metadata.legal.legaltags, lt1). In case of large number of records, this is a longer operation. We observed throttling on the cosmos-db during this process
2. No way to retry. Because Legal service updates the letalTag status in cosmosDb, running the `updateLegalTagStatus` job again will not pick up this legal tag. To do this, we are required to manually change the status of the legalTag and run the cron job again. Upon manual retries, we face the issue above where Storage is trying to process ALL records again.
3. What happens when Storage job is interrupted, possibly due to pod restart (high cpu utilization) or network error or cosmosDb error? Retrying the whole job doesn't help muchChad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/118[Azure] Deletion records with invalid legaltag without any logging2022-04-12T14:49:06ZYauheni Lesnikau[Azure] Deletion records with invalid legaltag without any loggingThere is an message handler which process the `LegalTagChanged ` events, and if the legal tag became invalid, appropriate records should be mark as `Inactive` (soft deleted). The issue is that there is no any logging which explicitly say...There is an message handler which process the `LegalTagChanged ` events, and if the legal tag became invalid, appropriate records should be mark as `Inactive` (soft deleted). The issue is that there is no any logging which explicitly says that deletion performs there.Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/120Inconsistent behavior of storage PUT when skipdupes is passed as true2022-08-26T10:06:09ZMandar KulkarniInconsistent behavior of storage PUT when skipdupes is passed as trueStorage PUT API has an optional query parameter called [skipdupes](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#using-skipdupes)
Current behavior of storage PUT API to update...Storage PUT API has an optional query parameter called [skipdupes](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#using-skipdupes)
Current behavior of storage PUT API to update existing record is:
If skipdupes is passed as true, if the data, meta blocks in the input request are same as the existing record content, then the record update is skipped.
When skipdupes is passed as true, the record update is skipped in a scenario when the user has passed different legal, acl, tags blocks content in the input request, but data and meta block content is same as that of the existing record.
(This happens because when skipdupes is passed as true, the storage service compares only data and meta blocks of the incoming and existing records and not all the blocks in the record.)
Expected behavior is :
If skipdupes is passed as true, both data and meta blocks should be compared. If data block is same but legal, acl, tags blocks are different, then the same record should be updated. To keep the behavior in-sync with PATCH API, the record version should not be updated in case only tags, legal or acl blocks are being changed.https://community.opengroup.org/osdu/platform/system/storage/-/issues/121Storage Schema endpoints should be obsoleted2022-08-24T10:54:38ZGary MurphyStorage Schema endpoints should be obsoletedRemove code and config related to the storage schemas APIs from OSDU as they are EOL.
The following APIs are to be removed
- GET /Schema
- DELETE /Schema
- POST /schemaRemove code and config related to the storage schemas APIs from OSDU as they are EOL.
The following APIs are to be removed
- GET /Schema
- DELETE /Schema
- POST /schemahttps://community.opengroup.org/osdu/platform/system/storage/-/issues/122Storage Service Records Fetch Error2022-08-24T10:52:45ZSamiullah GhousudeenStorage Service Records Fetch Error**Not able to retrieve records from Storage Service **
If record id contains html encoded characters(%2F), then Storage service doesn't returns expected result but Search query returns as expected.
This issue same across all CSP's - A...**Not able to retrieve records from Storage Service **
If record id contains html encoded characters(%2F), then Storage service doesn't returns expected result but Search query returns as expected.
This issue same across all CSP's - AZURE, GCP, IBM & AWS.
For example , below query returns expected result through Search Service -
{
"kind": "*:wks:reference-data--UnitOfMeasure:1.0.0",
"limit": 10,
"aggregateBy":"kind",
"query":"id:\"osdu:reference-data--UnitOfMeasure:V%2FB\" OR id:\"opendes:reference-data--UnitOfMeasure:v%2Fv\" OR id:\"odesprod:reference-data--UnitOfMeasure:H%2Fm\" OR id:\"opendes:reference-data--UnitOfMeasure:US%2FF\""
}
However, in Storage service getting response - HTTP Status 400 – Bad Request
<!doctype html>
<html lang="en">
<head>
<title>HTTP Status 400 – Bad Request</title>
<style type="text/css">
body {
font-family: Tahoma, Arial, sans-serif;
}
h1,
h2,
h3,
b {
color: white;
background-color: #525D76;
}
h1 {
font-size: 22px;
}
h2 {
font-size: 16px;
}
h3 {
font-size: 14px;
}
p {
font-size: 12px;
}
a {
color: black;
}
.line {
height: 1px;
background-color: #525D76;
border: none;
}
</style>
</head>
<body>
<h1>HTTP Status 400 – Bad Request</h1>
</body>
</html>Marc Burnie [AWS]Marc Burnie [AWS]https://community.opengroup.org/osdu/platform/system/storage/-/issues/123Storage GET record returns 404 for records with optional version (Record ID e...2023-06-06T20:04:08ZAn NgoStorage GET record returns 404 for records with optional version (Record ID ending with colon)Storage GET /api/storage/v2/records/{id} returns 404 error for records whose ID ends with a colon (version is empty).
For example, "osdu:master-data--Wellbore:nz-100000391126:"
This is the case where the version component is empty (this...Storage GET /api/storage/v2/records/{id} returns 404 error for records whose ID ends with a colon (version is empty).
For example, "osdu:master-data--Wellbore:nz-100000391126:"
This is the case where the version component is empty (this is allowed as part of [this change](https://community.opengroup.org/osdu/platform/system/storage/-/issues/26#summary-january-26-2021) in record ID validation).
Expected behavior should be returning the latest version of the record.https://community.opengroup.org/osdu/platform/system/storage/-/issues/124ADR: Supporting data block modification through Storage PATCH API call2023-07-05T09:41:17ZMandar KulkarniADR: Supporting data block modification through Storage PATCH API callSupporting data block modification Storage PATCH API call
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Only record tags, legal tags and ACLs can up updated through PATCH AP...Supporting data block modification Storage PATCH API call
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Only record tags, legal tags and ACLs can up updated through PATCH API in storage service.
The PATCH API cannot be used to update the data block in the record/s. For updating data block, PUT API needs to be used.
## Tradeoff Analysis
Updating an individual attribute inside data block needs two calls currently, one to GET the record and then PUT call to update the record content with new attribute value.
Providing PATCH API to update attributes in data block will support doing this operation in just one call to OSDU storage.
## Decision
We can update PATCH API to support modifications in data blocks. The API will continue to follow the [rfc6902 standard](https://www.rfc-editor.org/rfc/rfc6902.html).
Currently the PATCH API supports modifications in record tags, legal tags and ACLs only. It supports 3 operations namely add, replace and remove.
The same operations would be supported for data block.
- In "add" operation, specified property from the request would be appended with values provided in "value" field.
- In "replace" operation, specified property from the request would be fully replaced by values provided in "value" field.
- In "remove" operation, values provided in "value" field would be removed for specified property from the request.
Users specify the complete path to the property they want to update in "path" field, i.e. "/acl/viewers" indicates the values for metadata acl viewers would be updated.
Similarly "/data/TechnicalAssuranceID" would indicate that TechnicalAssuranceID attribute from the data block would be updated.
"/data/CurrentOperatorID" would indicate that CurrentOperatorID attribute from the data block would be updated.
"/data/EXtensionProperties/Attribute1" would indicate that Attribute1 from the ExtensionProperties inside data block would be updated.
"/data/SpatialLocation/SpatialGeometryTypeID" would indicate that SpatialGeometryTypeID from the SpatialLocation inside data block would be updated.
Version of the record would be incremented in case of data block update through PATCH API to maintain consistent behavior with PUT API.
## Consequences
- PATCH API behavior will be updated.
- Storage service documentation needs to be updated.M17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/125Very high number of 429s on CosmosDb when there is a usage spike in Storage `...2023-07-19T19:35:51ZAlok JoshiVery high number of 429s on CosmosDb when there is a usage spike in Storage `query/records:batch api`In one of our client environments, we are consistently seeing very high number of 429 errors from CosmosDb. This is causing latency spikes for Storage apis.
From our investigation, this seems to be related to the query/records:batch api...In one of our client environments, we are consistently seeing very high number of 429 errors from CosmosDb. This is causing latency spikes for Storage apis.
From our investigation, this seems to be related to the query/records:batch api performance/optimization issue. We see a direct correlation between `query/records:batch api` spike and CosmosDb 429 error spike within multiple time windows. Please see attached images for reference.
In the first image, we can see a time window when CosmosDb threw a lot of 429 errors. In the second image, we can see Storage api usage pattern. Most of the api calls are made to the `query/records:batch api` which also affects latency numbers. The patterns on both images are very similar
![ComosDb_usage](/uploads/898f423082ae4193bb7636b058905555/ComosDb_usage.PNG)![api_usage](/uploads/ad233ce903b1596a3dc1f6048548088f/api_usage.PNG)
We've tried increasing the RUs on cosmosDb on multiple incidents but that doesn't help.
Further load tests showed that query/records:batch can be a root cause of the 429 errors.
Into the scope of the issue fixing it would be reasonable to implement some features from the topic https://docs.microsoft.com/en-us/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-javahttps://community.opengroup.org/osdu/platform/system/storage/-/issues/126All versions of a record have the same modifyUser and modifyTime2023-05-30T10:26:54ZAn NgoAll versions of a record have the same modifyUser and modifyTimeThe concept is that one record should have 1 version of metadata.
However, in regard to modifyUser and modifyTime attributes, they should be different for each version.
Currently, the behaviors are as implemented, but the behavior by th...The concept is that one record should have 1 version of metadata.
However, in regard to modifyUser and modifyTime attributes, they should be different for each version.
Currently, the behaviors are as implemented, but the behavior by the above concept is wrong.
So with the current behavior, for multiple versions of the same record modifyTime and modifyUser value are same and they are overwritten to all versions during every modification made to the record.
Which means for records having only 1 version, it is like below.
|version1|
|:-------|
|createUser|
|createTime|
But when the record is modified and multiple versions are created, the metadata of the record for latest version is applied to all versions including the first version as well, and all versions have value for modifyUser and modifyTime attributes.
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
|modifyUser2 |modifyUser2|modifyUser2|
|modifyTime2 |modifyTime2|modifyTime2|
**Expected:**
Version 1 should only have createUser and createTime. modifyUser and modifyTime should not exist in the first version.
Version 2+ should have different modifyUser and modifyTime for each version
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
| |modifyUser1|modifyUser2|
| |modifyTime1|modifyTime2|https://community.opengroup.org/osdu/platform/system/storage/-/issues/127Soft-deleted record was skipped when re-ingested with same data2022-08-23T15:54:49ZAn NgoSoft-deleted record was skipped when re-ingested with same data**Steps to reproduce the current behavior:**
1. Ingest a record
2. Soft-delete the record
3. Fetch the record to confirm it is now "inactive", "not found"
**Case 1:** Works as expected
4. Ingest the same record using the same id and D...**Steps to reproduce the current behavior:**
1. Ingest a record
2. Soft-delete the record
3. Fetch the record to confirm it is now "inactive", "not found"
**Case 1:** Works as expected
4. Ingest the same record using the same id and DIFFERENT data, skipdupes=true
> Record was NOT skipped. Deleted record became active again. A new version of the record is created.
> Example response:
```
{
"recordCount": 1,
"recordIds": [
"osdu:document:ee7e8869217541a8b31f4e2ea18f7e3a"
],
"skippedRecordIds": [],
"recordIdVersions": [
"osdu:document:ee7e8869217541a8b31f4e2ea18f7e3a:1654731042152281"
]
}
```
5. Soft-delete the record
6. Fetch the record to confirm it is now "inactive", "not found"
**Case 2:** Skips the record even though it was already deleted
7. Ingest the same record using the same id, SAME data, skipdupes=true
> Record was skipped. So the record remains "inactive", "not found". The PUT call did nothing to the record.
>Example response:
```
{
"recordCount": 1,
"skippedRecordIds": [
"slb-osdu-dev-sis-internal-hq:document:ee7e8869217541a8b31f4e2ea18f7e3a"
]
}
```
**Expected behavior:**
If skipdupes is true
- if the record doesn't exist at all, then create a new record.
- **if the record was soft-deleted, then make the record active again if the data is the same (last deleted version becomes the latest version), or create a new version if data is different.**
- if the record exists,
- if the data is the same, then skip it.
- if data is different, then create a new version
If skipdupes is false:
- if the record doesn't exist at all, then create a new record.
- **if the record was soft-deleted, then create a new version of the record**
- if the record exists, then a new version of the record will be created, regardless whether the data is the same or different.https://community.opengroup.org/osdu/platform/system/storage/-/issues/128Data store location is not appended to legal tag ORDC of record2022-10-26T14:11:23ZAn NgoData store location is not appended to legal tag ORDC of recordUpon creating a record, the data store location/country is expected to be appended to the ORDC (Other relevant data countries) list.
This is not the current behavior.
```
"otherRelevantDataCountries": [
"VN"
]
```
Here, "VN"...Upon creating a record, the data store location/country is expected to be appended to the ORDC (Other relevant data countries) list.
This is not the current behavior.
```
"otherRelevantDataCountries": [
"VN"
]
```
Here, "VN" was provided when creating the record. Upon record creation, the system is supposed to append "US" (US environment partition), "BE" (EU) or "NL" (WEU), etc..https://community.opengroup.org/osdu/platform/system/storage/-/issues/129Storage returns inconsistent and wrong responses if nested attribute filters ...2023-07-07T12:15:36ZAn NgoStorage returns inconsistent and wrong responses if nested attribute filters which do not exist are specifiedUsing /api/storage/v2/records/{id}
optional attribute filter:
![image](/uploads/05e101feb2102c16adba4575f31a4aa7/image.png)
Example, given this data:
```
"data": {
"relationships": {
"well": {
"id": "slb-osdu-tryme:...Using /api/storage/v2/records/{id}
optional attribute filter:
![image](/uploads/05e101feb2102c16adba4575f31a4aa7/image.png)
Example, given this data:
```
"data": {
"relationships": {
"well": {
"id": "slb-osdu-tryme:well",
"name": "Card Creek 2"
},
"relatedItems": {
"ids": [
"Log1",
"Marker1"
],
"names": [
"Log Name1",
"Marker Name1"
]
}
}
```
A few observances when filtering with string:
data.something returns 200
data.relationships.something returns 200
data.relationships.well.something returns 200
data.relationships.well.number returns 500 at first. Then after a few tries, it returns 200.
data.relationships.something returns 200
data.relationships.relatedItem.something returns 500 (no s on relatedItems)
data.relationships.relatedItems.something returns 500 at first, then 200 after that.
Expected return code should be 400.M19 - Release 0.22https://community.opengroup.org/osdu/platform/system/storage/-/issues/130Storage PUT: setting a non-number value to a number attribute results in an e...2022-08-23T15:45:28ZAn NgoStorage PUT: setting a non-number value to a number attribute results in an empty 400 response (no error message)For example, given this payload. This was provided:
` "value": Infinity`
```
curl --location --request PUT 'https://domain.com/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'data-partition-id: osdu' \
--hea...For example, given this payload. This was provided:
` "value": Infinity`
```
curl --location --request PUT 'https://domain.com/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'data-partition-id: osdu' \
--header 'Content-Type: application/json' \
--header 'Authorization: <token>' \
--data-raw '[
{
"acl": {
"owners": [
"data.default.owners@domain.com"
],
"viewers": [
"data.default.viewers@domain.com"
]
},
"data": {
"ExtensionProperties": {
"osdu": {
"curvesProperties": [
{
"curveID": "CTEM_GPITF",
"properties": [
{
"name": "MEASURE-POINT-OFFSET",
"value": Infinity
}
]
}
]
}
}
},
"kind": "osdu:wks:work-product-component--WellLog:1.1.0",
"legal": {
"legaltags": [
"osdu-default-legal"
],
"otherRelevantDataCountries": [
"US"
]
}
}
]'
```
Reponse:
Empty 400
![image](/uploads/18749c6ea879c9c888a3c5c173288b23/image.png)https://community.opengroup.org/osdu/platform/system/storage/-/issues/131No update notification sent2022-08-23T21:00:11ZQiang FuNo update notification sentStep to reproduce:
1) setup notification endpoint
2) subscribe to "recordchange" topic
3) create a record by using api/storage/v2/records endpoint. Verify a "create" notification received.
4) modify the payload used in step 3 and run ap...Step to reproduce:
1) setup notification endpoint
2) subscribe to "recordchange" topic
3) create a record by using api/storage/v2/records endpoint. Verify a "create" notification received.
4) modify the payload used in step 3 and run api/storage/v2/records again, Another "create" notification received.
5) modify the payload used in step 4 and run api/storage/v2/records/?skipdupes=true, "create" notification received.
In step 4 and 5 we are expecting a "update" notificationhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/132CORS blocking query/records:bacth endpoint2022-11-11T13:38:55ZYifan YeCORS blocking query/records:bacth endpointCORS does not allow frame-of-reference header and query/records:bacth endpoint is blocked by CORS.CORS does not allow frame-of-reference header and query/records:bacth endpoint is blocked by CORS.M14 - Release 0.17Yifan YeYifan Yehttps://community.opengroup.org/osdu/platform/system/storage/-/issues/133[BUG] Create error messages based on response from OPA2022-09-07T21:28:20ZRostislav Vatolinvatolinrp@gmail.com[BUG] Create error messages based on response from OPAStorage has failing tests due to incorrect logic related to integration with OPA. Storage should not be responsible for creating a custom error message in case OPA returns error. Please use error message returned from OPA. Please make su...Storage has failing tests due to incorrect logic related to integration with OPA. Storage should not be responsible for creating a custom error message in case OPA returns error. Please use error message returned from OPA. Please make sure all integration tests are passing when opa is turned on.
Related MR: https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/merge_requests/122Rostislav Vatolinvatolinrp@gmail.comRostislav Vatolinvatolinrp@gmail.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/134Storage and PUT - Any way to work around the limit of 500 records?2022-12-09T13:35:40ZDebasis ChatterjeeStorage and PUT - Any way to work around the limit of 500 records?I was trying to persist standard reference values for entity such as UnitOfMeasure and hit this limit.
```
{
"code": 400,
"reason": "Validation error.",
"message": "createOrUpdateRecords.records: Up to 500 records can be ing...I was trying to persist standard reference values for entity such as UnitOfMeasure and hit this limit.
```
{
"code": 400,
"reason": "Validation error.",
"message": "createOrUpdateRecords.records: Up to 500 records can be ingested at a time"
}
```
Is there something we can do for work around (apart from having to split the original JSON load manifest into smaller chunks)?
cc - @krveduru for informationhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/135Storage release/0.15 build Failure2022-08-15T16:12:48ZShrikant GargStorage release/0.15 build FailureStorage release/0.15 build Failure start failing as it is referring to core-common 0.15.0-SNAPSHOT which is cleaned up. Ideally SNAPShot versions should be be referenced.
So upgrading it to latest version is recomendedStorage release/0.15 build Failure start failing as it is referring to core-common 0.15.0-SNAPSHOT which is cleaned up. Ideally SNAPShot versions should be be referenced.
So upgrading it to latest version is recomendedM12 - Release 0.15Shrikant GargShrikant Garghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/136Schema Validation Failed - Storage Service2022-11-21T11:11:21ZSamiullah GhousudeenSchema Validation Failed - Storage ServiceData ingestion through `Storage PUT service` does not validate schema, kind & attributes.
As in below request able to ingest `"TestAttribute": "Test-Sami"` attribute/value which is not defined in Contract type reference data - WKS schem...Data ingestion through `Storage PUT service` does not validate schema, kind & attributes.
As in below request able to ingest `"TestAttribute": "Test-Sami"` attribute/value which is not defined in Contract type reference data - WKS schema.
<details><summary> Storage PUT Request </summary>
<pre><code>
curl --location --request PUT 'https://osdu-ship.msft-osdu-test.org/api/storage/v2/records' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: opendes' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1Qi ' \
--data-raw '[
{
"id": "opendes:reference-data--ContractorType:test-sami01",
"kind": "osdu:wks:reference-data--ContractorType:1.0.0",
"acl": {
"owners": [
"data.default.owners@opendes.contoso.com"
],
"viewers": [
"data.default.viewers@opendes.contoso.com"
]
},
"legal": {
"legaltags": [
"opendes-public-usa-dataset-7643990"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name2": "Well",
"ID2": "Well",
"Code2": "Well",
"Source2": "Workbook Published/FacilityTypeType.1.0.0.xlsx; commit SHA 0b4db59a.",
"TestAttribute" : "Test-Sami"
}
}
]'
</code></pre>
</details>
<details><summary> Storage GET Request </summary>
<pre><code>
{
"data": {
"Name2": "Well",
"ID2": "Well",
"Code2": "Well",
"Source2": "Workbook Published/FacilityTypeType.1.0.0.xlsx; commit SHA 0b4db59a.",
"TestAttribute": "Test-Sami"
},
"meta": null,
"id": "opendes:reference-data--ContractorType:test-sami01",
"version": 1658769507968280,
"kind": "osdu:wks:reference-data--ContractorType:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@opendes.contoso.com"
],
"owners": [
"data.default.owners@opendes.contoso.com"
]
},
"legal": {
"legaltags": [
"opendes-public-usa-dataset-7643990"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "preshipping@azureglobal1.onmicrosoft.com",
"createTime": "2022-07-05T17:06:37.282Z",
"modifyUser": "preshipping@azureglobal1.onmicrosoft.com",
"modifyTime": "2022-07-25T17:18:28.992Z"
}
</code></pre>
</details>
Also, able to ingest and fetch data through Storage Service without creating schema `osdu:wks:reference-data--ContractorTypeTestSami:1.0.0 ` in OSDU system as noticed below :
<details><summary> Storage PUT Request </summary>
<pre><code>
curl --location --request PUT 'https://osdu-ship.msft-osdu-test.org/api/storage/v2/records' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: opendes' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSU ' \
--data-raw '[
{
"id": "opendes:reference-data--ContractorTypeTestSami:test-sami01",
"kind": "osdu:wks:reference-data--ContractorTypeTestSami:1.0.0",
"acl": {
"owners": [
"data.default.owners@opendes.contoso.com"
],
"viewers": [
"data.default.viewers@opendes.contoso.com"
]
},
"legal": {
"legaltags": [
"opendes-public-usa-dataset-7643990"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name2": "Well",
"ID2": "Well",
"Code2": "Well",
"Source2": "Workbook Published/FacilityTypeType.1.0.0.xlsx; commit SHA 0b4db59a.",
"TestAttribute" : "Test-Sami"
}
}
]'
</code></pre>
</details>
<details><summary> Storage GET Request </summary>
<pre><code>
{
"data": {
"Name2": "Well",
"ID2": "Well",
"Code2": "Well",
"Source2": "Workbook Published/FacilityTypeType.1.0.0.xlsx; commit SHA 0b4db59a.",
"TestAttribute": "Test-Sami"
},
"meta": null,
"id": "opendes:reference-data--ContractorTypeTestSami:test-sami01",
"version": 1658770548926014,
"kind": "osdu:wks:reference-data--ContractorTypeTestSami:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@opendes.contoso.com"
],
"owners": [
"data.default.owners@opendes.contoso.com"
]
},
"legal": {
"legaltags": [
"opendes-public-usa-dataset-7643990"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "preshipping@azureglobal1.onmicrosoft.com",
"createTime": "2022-07-25T17:35:49.251Z"
}
</code></pre>
</details>
cc- @chad @debasischttps://community.opengroup.org/osdu/platform/system/storage/-/issues/138Soft Delete APIs should enforce data owner access check2023-05-30T08:58:09ZKelly ZhouSoft Delete APIs should enforce data owner access checkFollowing endpoints only check for data viewer access currently:
- POST **/api/storage/v2/records/{id}:delete** (soft delete API)
- POST **/api/storage/v2/records/delete** (bulk delete API)
when user asks to soft delete the record, stor...Following endpoints only check for data viewer access currently:
- POST **/api/storage/v2/records/{id}:delete** (soft delete API)
- POST **/api/storage/v2/records/delete** (bulk delete API)
when user asks to soft delete the record, storage service should enforce the same level data access check as Purge API (DELETE /api/storage/v2/records/{id}), where only data owner can purge the record.
when the data access check is updated, we need to also update integration tests to reflect such changes in any related tests too.
As storage starts to integrate with Policy/OPA, we need to update corresponding data authz policies to reflect the changes as well.M14 - Release 0.17https://community.opengroup.org/osdu/platform/system/storage/-/issues/139[STORAGE] PUT. Reports 201 success with a 50 records payload but actually fails2023-02-13T15:19:27ZErnesto Gutierrez[STORAGE] PUT. Reports 201 success with a 50 records payload but actually fails**Description**
While issuing following request [50_records_payload.json](/uploads/3d2ddceee544b9741af0a0b54fff9981/50_records_payload.json), the storage service returns a 201 with records and versions [STORAGE_201_put_records.json](/upl...**Description**
While issuing following request [50_records_payload.json](/uploads/3d2ddceee544b9741af0a0b54fff9981/50_records_payload.json), the storage service returns a 201 with records and versions [STORAGE_201_put_records.json](/uploads/48a60f0dfa71bb13852b7ca8cc12fd8b/STORAGE_201_put_records.json).
But when trying to fecth the records they are not created/updated.
Looking at the logs [Storage_LOG_50_records.txt](/uploads/fdf868480d289199eb916f9d5d575b8f/Storage_LOG_50_records.txt), it seems the service is reaching this line https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/blob/1bddde80718274e34a36aee673092bf20526f5aa/src/main/java/org/opengroup/osdu/azure/cosmosdb/CosmosStoreBulkOperations.java#L124
**Expected behavior**
Two behaviors are expected
1. Payload with 50 records should not fail
2. If for any reason the request fail, the error should be propagated back and return error instead of 201.M13 - Release 0.16Krishna Nikhil VedurumudiKrishna Nikhil Vedurumudihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/140[BUG] Storage error message is different if OPA is enabled2022-11-07T14:06:29ZMarc Burnie [AWS][BUG] Storage error message is different if OPA is enabledStorage QueryServiceImpl throws a forbidden response with a different message if OPA is enabled when the requesting user does not belong to the viewer group: "The user does not have access to the record". With OPA disabled, the following...Storage QueryServiceImpl throws a forbidden response with a different message if OPA is enabled when the requesting user does not belong to the viewer group: "The user does not have access to the record". With OPA disabled, the following message is returned: "The user is not authorized to perform this action". The latter message is expected by the integration test.
Related to issue: https://community.opengroup.org/osdu/platform/system/storage/-/issues/133M14 - Release 0.17Marc Burnie [AWS]Marc Burnie [AWS]https://community.opengroup.org/osdu/platform/system/storage/-/issues/141[BUG] Test failure when OPA is enabled due to legal response caching2022-11-07T14:07:03ZMarc Burnie [AWS][BUG] Test failure when OPA is enabled due to legal response cachingThe default legal rego policy specifies a caching period of 900 seconds. When OPA is enabled, it is possible to reuse deleted/invalidated legal tags during this period. The grace period specified in the PubSubEndpointTest is 10 seconds c...The default legal rego policy specifies a caching period of 900 seconds. When OPA is enabled, it is possible to reuse deleted/invalidated legal tags during this period. The grace period specified in the PubSubEndpointTest is 10 seconds causing the test to fail.
Related MR: https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/437/diffsM14 - Release 0.17Marc Burnie [AWS]Marc Burnie [AWS]https://community.opengroup.org/osdu/platform/system/storage/-/issues/142[BUG] Incorrect Operation Type Published When Updating Record Kind When OPA i...2022-11-07T14:06:49ZMarc Burnie [AWS][BUG] Incorrect Operation Type Published When Updating Record Kind When OPA is EnabledWhen OPA is enabled for Storage service and updating a record's kind field, the record's previous kind is still observable in Search service.
For example, creating the following record using PUT {{base_url}}/api/storage/v2/records:
```...When OPA is enabled for Storage service and updating a record's kind field, the record's previous kind is still observable in Search service.
For example, creating the following record using PUT {{base_url}}/api/storage/v2/records:
```JSON
[
{
"id":"{{data_partition_id}}:dataset--File.Generic:1000",
"kind": "osdu:wks:dataset--File.Generic:1.0.0",
"data": {
"Endian": "BIG",
"Name": "dummy",
"DatasetProperties.FileSourceInfo.FileSource": "",
"DatasetProperties.FileSourceInfo.PreloadFilePath": ""
},
"namespace": "osdu:wks",
"legal": {
"legaltags": [
"{{data_partition_id}}-public-usa-dataset-1"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"acl": {
"viewers": [
"data.default.viewers@{{data_partition_id}}.{{domain}}"
],
"owners": [
"data.default.owners@{{data_partition_id}}.{{domain}}"
]
},
"type": "dataset--File.Generic",
"version": 1620833190423950
}
]
```
And updating the kind to be:
```JSON
[
{
"id":"{{data_partition_id}}:dataset--File.Generic:1000",
"kind": "osdu:wks:dataset--File.Generic:1.0.1",
...
}
]
```
It is expected that Search service would return the following result when making the request to POST {{base_url}}/api/search/v2/query:
Body:
```JSON
{
"kind": "osdu:wks:dataset--File.Generic:1.0.0"
}
```
Expected Result:
```JSON
{
"results": [],
"aggregations": [],
"totalCount": 0
}
```
However, the un-updated result is returned:
```JSON
{
"results": [
{
"kind": "osdu:wks:dataset--File.Generic:1.0.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.example.com"
],
"owners": [
"data.default.owners@osdu.example.com"
]
},
"type": "dataset--File.Generic",
"version": 1663008428769106,
"tags": null,
"modifyUser": "admin@testing.com",
"modifyTime": "2022-09-12T18:47:08.806Z",
"createTime": "2022-09-09T19:12:46.378Z",
"authority": "osdu",
"namespace": "osdu:wks",
"legal": {
"legaltags": [
"osdu-public-usa-dataset-1"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "admin@testing.com",
"id": "osdu:dataset--File.Generic:1000"
}
],
"aggregations": null,
"totalCount": 1
}
```
Performing a record query on Storage service returns the expected response, as well as Search service when searching for the updated kind.
The issue appears to be caused by the incorrect message being published by Storage service resulting in Indexer service only creating the new index for the updated record and not removing the record from the previous kind. Indexer knows to remove a record from an index during an update operation when there is a previousVersionKind field that is not null or empty. Storage publishes a message with this field when the updated records kind is different than the existing records kind. The existing records kind is overwritten to match the updated record when validating the record using OPA, so when the kind's are compared by Storage's IngestService, the result always evaluates to be a match and, therefore, the previousVersionKind field is never populated.M14 - Release 0.17Marc Burnie [AWS]Okoun-Ola Fabien HouetoGustavo UrdanetaMarc Burnie [AWS]https://community.opengroup.org/osdu/platform/system/storage/-/issues/143Storage sends an exceeding number of legal tags for validation2022-11-10T16:53:40ZAn NgoStorage sends an exceeding number of legal tags for validationCompliance Validate Legal tags API has a limit of 25.
When an ingestion is done, Storage sends the provided legal tags to Compliance to ensure they are valid before proceeding with the record creation.
If there are more than 25 legal tag...Compliance Validate Legal tags API has a limit of 25.
When an ingestion is done, Storage sends the provided legal tags to Compliance to ensure they are valid before proceeding with the record creation.
If there are more than 25 legal tags being sent in the ingestion/creation request, then storage needs to split the requests into chunks of 25. However, it is not doing this check and sends over all of the legal tags included in the request.M14 - Release 0.17https://community.opengroup.org/osdu/platform/system/storage/-/issues/144BUG: Class cast exceptions into the CrsConversionService2022-09-30T10:09:53ZYauheni LesnikauBUG: Class cast exceptions into the CrsConversionServiceIn some of our envs we observed `ClassCastException ` in `CrsConversionService`.
Example:
```
"type": "java.lang.ClassCastException",
"message": "com.google.gson.JsonPrimitive cannot be cast to com.google.gson.JsonObject",
"p...In some of our envs we observed `ClassCastException ` in `CrsConversionService`.
Example:
```
"type": "java.lang.ClassCastException",
"message": "com.google.gson.JsonPrimitive cannot be cast to com.google.gson.JsonObject",
"parsedStack": [
{
"level": 0,
"method": "com.google.gson.JsonObject.getAsJsonObject",
"fileName": "JsonObject.java",
"line": 192
},
{
"level": 1,
"method": "org.opengroup.osdu.storage.conversion.CrsConversionService.getFeature",
"fileName": "CrsConversionService.java",
"line": 621
},
```Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/145Storage not consuming OSDU record id regex in RecordAncestry2023-05-30T08:56:49ZKelly ZhouStorage not consuming OSDU record id regex in RecordAncestryStorage does not consume OSDU record id regex properly when it comes to parent records, current method to get parent record id and version number will cause error, i.e. dp-id:test:parent::1234, new OSDU record id regex allow colon in pre...Storage does not consume OSDU record id regex properly when it comes to parent records, current method to get parent record id and version number will cause error, i.e. dp-id:test:parent::1234, new OSDU record id regex allow colon in previous section while Storage didn't respect that rule yet.
Changes need to be made in core common library to add validator for RecordAncestry which consumes OSDU record id regex properly add Storage needs to update the way how it gets parent record id and version number.M15 - Release 0.18https://community.opengroup.org/osdu/platform/system/storage/-/issues/146POST /query/records:batch with normalization stops converting after 1 convers...2022-10-28T08:04:37ZAn NgoPOST /query/records:batch with normalization stops converting after 1 conversion failureAn attribute was defined as a number in the schema:
```
"depthA": {
"title": "depthA",
"type": "number"
}
```
The meta specified is to convert the values in depthA from ft to meter.
```
"meta": [
{
"...An attribute was defined as a number in the schema:
```
"depthA": {
"title": "depthA",
"type": "number"
}
```
The meta specified is to convert the values in depthA from ft to meter.
```
"meta": [
{
"kind": "Unit",
"name": "ft",
"persistableReference": "{\"scaleOffset\":{\"scale\":0.3048,\"offset\":0.0},\"symbol\":\"ft\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
"propertyNames": [
"depthA",
"depthB"
],
```
The record was ingested/created with an empty string assigned to depthA.
```
"data": {
"depthA": "",
"depthB": 123,
"depthC": 456
},
```
Upon record creation, fetch API was called to normalize the record before indexing.
The conversion failed to convert depthA. An error was logged. Fetch API returned a 200, but with a conversion error.
![image](/uploads/28575874041594004a487f3ee009f1f9/image.png)
After this error, the API skipped conversion for other attributes.
Indexer saw this error and returned a 400 status. Trace index trace returns:
```
"statusCode": 400,
"trace": [
"Unit conversion: illegal value for property depthA"
]
```
**Action:** API should continue to convert all specified attributes, and log the conversion errors for those that failed.https://community.opengroup.org/osdu/platform/system/storage/-/issues/147Current implementation doesn't delete all versions of the record with purging...2023-07-19T19:31:54ZAlok JoshiCurrent implementation doesn't delete all versions of the record with purging a recordRepro steps:
- create a record with the PUT API
- create another version of the same record with the PUT API
- hard delete (purge) the record
Expected: All metadata and storage blobs should be purged
Actual: Metadata gets purged, but o...Repro steps:
- create a record with the PUT API
- create another version of the same record with the PUT API
- hard delete (purge) the record
Expected: All metadata and storage blobs should be purged
Actual: Metadata gets purged, but only latest version gets purged. This leaves dangling references of other versions in Blob Storage
**Note**: The bug was observed for Azure implementation, but other providers should confirm the behavior and put in a fix if requiredM15 - Release 0.18Alok JoshiAlok Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/148ADR: Separate modifyTime and modifyUser for every version of OSDU storage record2023-07-05T09:49:05ZMandar KulkarniADR: Separate modifyTime and modifyUser for every version of OSDU storage recordSeparate modifyTime and modifyUser for every version of OSDU storage record
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The concept is that one record should have 1 versio...Separate modifyTime and modifyUser for every version of OSDU storage record
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The concept is that one record should have 1 version of metadata.
However, in regard to modifyUser and modifyTime attributes, they should be different for each version.
Currently, the behaviors are as implemented, but the behavior by the above concept is wrong.
The original issue that was raised is [here](https://community.opengroup.org/osdu/platform/system/storage/-/issues/126).
So with the current behavior, for multiple versions of the same record modifyTime and modifyUser value are same and they are overwritten to all versions during every modification made to the record.
Which means for records having only 1 version, it is like below.
|version1|
|:-------|
|createUser|
|createTime|
But when the record is modified and multiple versions are created, the metadata of the record for latest version is applied to all versions including the first version as well.
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
|modifyUser2|modifyUser2|modifyUser2|
|modifyTime2|modifyTime2|modifyTime2|
Due to this behavior, the record modification history is lost and which versions of the record are created by which users cannot be tracked.
## Tradeoff Analysis
The metadata; which contains modifyUser, modifyTime attributes; will be stored separately against every record version.
This means the metadata stored for storage records will increase.
The record modification history can be tracked and which users created different versions of the record can be traced, which was not possible before.
## Decision
Version 1 should only have createUser and createTime. modifyUser and modifyTime should not exist in the first version.
Version 2+ should have different modifyUser and modifyTime for each version.
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
| |modifyUser1|modifyUser2|
| |modifyTime1|modifyTime2|
If the record meta-data (i.e. tags, legal tags and ACLs blocks from the record) is modified using storage **PATCH** API, version number is not changed and only the **latest** value for modifyUser and modifyTime will be maintained against that record version.
## Consequences
- Storage service behavior will change.
- Storage service documentation needs to be updated.M17 - Release 0.20Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/149ADR: Namespacing storage records2024-03-19T02:18:17Zashley kelhamADR: Namespacing storage records# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/...# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/48).
At its heart is the idea that data must be separated between the system of record and system of engagement. Today the OSDU only supports the system of record. All data therefore by default resides in the system of record and the APIs we use read, write and delete from the system of record.
In this ADR we are looking at how we can separate data in Storage service into separate namespaces. These namespaces can in the future be linked to a specific collaboration, which will form the system of engagement.
The system of engagement is meant to be interacted with by any application wanting to add/update data into the OSDU. Therefore we should have some understanding of what application is making the requests into the system of engagement.
We are starting with storage service as all other changes needed for the system of engagement data separation will be driven by this change.
![image](/uploads/b269adeef9f11aa773480f96a4b7c7d7/image.png)
As shown, the system of engagement can have many namespaces, one for each collaboration.
A single storage record can reside in any number of namespaces. A namespace can also have 0 or many Records.
A storage record consists of 2 parts, the metadata and the data.
```
{
id: "opendes:mastered-wellbore:12345678",
kind: "osdu:wks:mastered-wellbore:1.0.0",
...
...
data: {
...
...
}
}
```
Everything inside the 'data' json object shown above is classed as the data and everything else is the 'metadata'.
These are stored separately by the storage service in a 1-many relationship. Every time a Records data is updated it creates a new version of that data that points to a single metadata instance.
The reference is held directly in the metadata. We can think of the referencing of the data blocks to the metadata like this
Diagram 1
![image](/uploads/ecdb68f32ab861835cca78533ed0716f/image.png)
The latest data version referenced is the 'head' and is returned by default when no version is specified when using the Storage APIs.
If I retrieve an older version of the 'data' I am only ever returned the same version of the metadata.
With collaboration there is the possibility that many 'heads' exist at the same time, one per collaboration. There can be many collaborations and each collaboration can hold many entities.
Each collaboration should be treated independently. therefore any change to a Record in the context of a collaboration should be reflected only in that context and not affect any others.
# Out of scope
For this ADR we are looking only at how we separate data in Storage service between the System of Record (what exists today in OSDU) and System of engagement (collaborations).
We are **not** deciding on
- How DDMS will separate the data
- How Consumption services like search separate the data
- How data will transfer between the system of Record and system of engagement in Storage
- How collaborations will act on this or control this behavior or even what a collaboration entity looks like
- Any other service that might need to act on a collaboration context e.g. ingestion
# Solution
The suggestion is to create a different instance of the Storage metadata specific to the collaboration context. It is stored using a compound key of the record id + the collaboration id.
This collaboration id forms the namespace for a record, and combining the 2 means we have a unique metadata instance per collaboration.
Therefore if a Record is not assigned to a collaboration the namespace is the same as it is today (empty) and the id remains unchanged. This maintains current system behavior for existing data in the system of record.
>Note: The Record ID is never changed between namespaces and should be persisted and returned to the user the same as it is today no matter the context provided. The id of the document/row used in the database should **append** the namespace value so that multiple metadata instances can coexist for the same Record ID. This means the data model of the metadata needs to have a separate record id and row/document id value.
References to the data are held in each metadata allowing the same data to be referenced by multiple namespaces but also to have unique versions of a record Id to exist in individual namespaces. The reference is also quick and cheap to add/remove from different namespaces.
Diagram 2
![image](/uploads/6df9c0249d22cf3cbdd34e3d9b1f096f/image.png)
>Note that multiple collaborations could be active at the same time and the 'data' versions does not have to be linear between them. For example changes from different collaborations could overlap one another. This is because the version is already defined as an epoch timestamp and so is versioned based on when it was created.
Diagram 3
![image](/uploads/d69b9d0fd9ffdfe6af3913c35bdc7b84/image.png)
### Behavior of retrieval APIs
If we take diagram 3 as the current state of a Record we can look at how different API requests to it should be handled with and without a collaboration context.
#### Getting latest in collaboration 1
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>' \
--header 'x-collaboration: id=collaboration 1,application=<app-name>;' \
-- data-raw
```
Expected Result: V7 returned
#### Retrieving version 4 when no collaboration provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
#### Retrieving version 4 when collaboration 2 provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
--header 'x-collaboration: id=collaboration 2,application=<app-name>;' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
## Collaboration context header
The **x-collaboration** is an optional HTTP header that holds directives in requests instructing the Storage service to handle in context of the provided collaboration instance and not in the context of the system of record. We are designing it using directives so that is is more extensible overtime to incorporate other elements potentially needed by the collaboration feature set.
**NB: In the fullness of time many services will be impacted by the collaboration EA requirements. They could/should re-use this same header to support acting on a specific collaboration context for consistency and usability.**
### Syntax
Collaboration directives follow the validation rules below:
- Directives are case-insensitive but lowercase is recommended
- Multiple directives are comma-separated
### Request Directives
| Request | Description |
| ----------- | ----------- |
| id | Mandatory. The ID of the collaboration to handle the request against. |
| application | Mandatory. The name of the application sending the request. |
### Examples
#### Retrieve a specific version of a Record that exists in a collaboration
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'x-collaboration: id=<collaboration-id>,application=<app-name>;' \
--data-raw '
```
#### Retrieve a specific version of a Record that exists the system of record
We do not send a collaboration context here as it wants to access data from the system of record. This is the same request the user should be doing today.
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--data-raw '
```
Note the given record id and version of the record must exist in both the system of record and the collaboration id for both API requests to return successfully.
### Record changed on namespace
To guarantee that the current system behavior is not changed we will create a new record changed topic that is triggered only when A record is edited in some way in context to a collaboration.
This means the existing record changed topic remains unchanged and is triggered only when changes are made in the system of record like they are today.
The new Record changed on namespace topic can then be bound to by downstream listeners over timer as and when they want to support the namespace concept.
The new message will also include the extra context information about the namespace. The message will be the same as the current record change message except it will include the new header
```
'''
x-collaboration: id=<id>,application=<app-name>;
'''
...
```
On top of this the new topic should be exposed through the Notification service so it can be registered to by external consumers as needed.
# Consequences
The storage service should support a new 'collaboration' header. Anytime a collaboration id is provided in this header the storage service should act only in that context. This should mean all storage APIs need to act specific to the collaboration context given, for creation, update, retrieval and deletion of records.
If no header is provided the Storage service should function the same as it does today and no change in behavior should be observed.
In the shared code section we will generate a new 'collaboration context' class that is passed into the CSP specific data layer. This property will have the collaboration id and application name. Each CSP should use this combined with the record id for the primary key of the metadata's data model. In this way the collaboration id forms the namespace of the record id so multiple metadata's can exist simultaneously.
We need a new 'Record changed collaboration' message and have it exposed through notification service
The hard delete API needs to validate all contexts before deleting the blob as multiple contexts could be referencing the same blob instanceM15 - Release 0.18ashley kelhamashley kelhamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/150PersistedCollection cannot scale to large values, an upper limit for records ...2022-11-08T09:28:01ZGary MurphyPersistedCollection cannot scale to large values, an upper limit for records is neededPersisted Collections have been seen lately in various environments that are getting somewhat pathological, meaning they are straining the limits of what the consuming services (mainly Storage and Search) can handle. As the number of it...Persisted Collections have been seen lately in various environments that are getting somewhat pathological, meaning they are straining the limits of what the consuming services (mainly Storage and Search) can handle. As the number of items in a Persisted Collection rise, they will increase the size of the Storage record beyond practical limits as well as put a heavy load on Indexing and Search as they are updated.
An exact limit is a bit tricky to specify, but experience with 100K records has shown increased 500 return codes from Storage and Search when counts are in that neighborhood (100K).
Based on the above behavior (and the upcoming introduction of Collaboration Spaces which provide a scalable solution with transactions and promotion capabilities), it is proposed to introduce a practical limit for sizes of Persisted Collections. A straw man number could be on the order of 50K records mentioned in the Persisted Collection. Counts higher than that would trigger an error on Storage PUT and meaningful response text.
Collaboration Spaces will hopefully be the correct home for controlled collections of massive size (1M records is considered reasonable) since updates can be done via distributed transactions and no single Storage record has to scale to contain the contents of the collection. In the meantime, a limit for Persisted Collections is needed.
[Collaboration Spaces](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/48)
[PersistedCollection](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/work-product-component/PersistedCollection.1.0.0.md)https://community.opengroup.org/osdu/platform/system/storage/-/issues/151Storage service fails due to opa enabled value being true.2022-12-20T04:08:07ZNikhil Singh[MicroSoft]Storage service fails due to opa enabled value being true.2022-11-16 10:51:53.832 ERROR storage-6446654dcd-5m7cm --- [-nio-80-exec-52] o.o.o.a.l.Slf4JLogger correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9 data-partition-id=opendes api-method=PUT operation-name={PUT [/reco...2022-11-16 10:51:53.832 ERROR storage-6446654dcd-5m7cm --- [-nio-80-exec-52] o.o.o.a.l.Slf4JLogger correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9 data-partition-id=opendes api-method=PUT operation-name={PUT [/records], consumes [application/json], produces [application/json]} user-id=8b2a56ba-edf5-47ce-94b6-42c336ec8172 app-id=678fadf8-e5a8-46cd-a75d-4d6cc95d9bc9:storage.app error getting data authorization result {correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9, data-partition-id=opendes} org.opengroup.osdu.core.common.model.http.AppException: error getting data authorization result| at org.opengroup.osdu.storage.opa.service.OPAServiceImpl.evaluateDataAuthorizationPolicy(OPAServiceImpl.java:125) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.opa.service.OPAServiceImpl.validateUserAccessToRecords(OPAServiceImpl.java:86) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]|
at org.opengroup.osdu.storage.service.IngestionServiceImpl.validateUserAccessAndCompliancePolicyConstraints(IngestionServiceImpl.java:415) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.service.IngestionServiceImpl.getRecordsForProcessing(IngestionServiceImpl.java:176) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.service.IngestionServiceImpl.createUpdateRecords(IngestionServiceImpl.java:98) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.provider.azure.service.IngestionServiceAzureImpl.createUpdateRecords(IngestionServiceAzureImpl.java:27) ~[classes!/:?]| at org.opengroup.osdu.storage.api.RecordApi.createOrUpdateRecords(RecordApi.java:80) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.api.RecordApi$$FastClassBySpringCGLIB$$495e8f0c.invoke(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 11 lines| at org.opengroup.osdu.storage.api.RecordApi$$EnhancerBySpringCGLIB$$a32ffde7.createOrUpdateRecords(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.api.RecordApi$$FastClassBySpringCGLIB$$495e8f0c.invoke(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 9 lines| at org.opengroup.osdu.storage.api.RecordApi$$EnhancerBySpringCGLIB$$1ec1cefc.createOrUpdateRecords(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 2 lines| at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_332]| ... suppressed 18 lines| at org.opengroup.osdu.storage.util.StorageFilter.doFilter(StorageFilter.java:86) [storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 2 lines| at org.opengroup.osdu.azure.filters.TransactionLogFilter.doFilter(TransactionLogFilter.java:74) [core-lib-azure-0.17.0-rc14.jar!/:?]| ... suppressed 34 lines| at org.opengroup.osdu.azure.filters.Slf4jMDCFilter.doFilter(Slf4jMDCFilter.java:69) [core-lib-azure-0.17.0-rc14.jar!/:?]| ... suppressed 18 lines| at com.microsoft.applicationinsights.web.internal.WebRequestTrackingFilter.doFilter(WebRequestTrackingFilter.java:142) [applicationinsights-web-2.6.4.jar!/:?]| ... suppressed 18 lines|
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]|Nikhil Singh[MicroSoft]Nikhil Singh[MicroSoft]https://community.opengroup.org/osdu/platform/system/storage/-/issues/152Upgrade azure-storage SDK2022-11-28T14:39:21ZNur SheikhUpgrade azure-storage SDKIn storage service we are using the azure-storage sdk 8.6.5 from com.microsoft.azure package which is too old and not having much support. It iis advisable to use the latest sdk for com.azure package.In storage service we are using the azure-storage sdk 8.6.5 from com.microsoft.azure package which is too old and not having much support. It iis advisable to use the latest sdk for com.azure package.https://community.opengroup.org/osdu/platform/system/storage/-/issues/153Indexer fetch records requests should not be checked via OPA/Policy (Or any o...2023-03-06T10:20:12ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comIndexer fetch records requests should not be checked via OPA/Policy (Or any other service, that sends internal requests)**Problem:**
Currently, the Storage service will evaluate policies for service requests of the Indexer service, which doesn't make sense since the indexer should be able to fetch any record ingested to the platform.
Indexer fetch reque...**Problem:**
Currently, the Storage service will evaluate policies for service requests of the Indexer service, which doesn't make sense since the indexer should be able to fetch any record ingested to the platform.
Indexer fetch requests use common requests authentication flow when OPA integration is enabled:
https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/opa/service/OPAServiceImpl.java#L104
~~~
http://localhost:8181/v1/data/osdu/partition/osdu/dataauthz/records
{
"input": {
"operation": "view",
"token": "indexer-service-token",
"datapartitionid": "osdu",
"records": [{
"id": "osdu:master-data--Well:999907686759",
"kind": "osdu:wks:master-data--Well:1.0.0",
"legal": {
"legaltags": ["osdu-demo-legaltag"],
"otherRelevantDataCountries": ["US"],
"status": "compliant"
},
"acls": {
"viewers": ["data.default.viewers@osdu.osdu-gcp.go3-nrg.projects.epam.com"],
"owners": ["data.default.owners@osdu.osdu-gcp.go3-nrg.projects.epam.com"]
}
}
]
}
}
~~~
And it is possible that Indexer will not be authorized to fetch records:
~~~
HttpResponse(headers = {
null = [HTTP / 1.1 200 OK],
Content - Length = [305],
Date = [Tue, 29 Nov 2022 10: 58: 31 GMT],
Content - Type = [application / json]
}, body = {
"result": [{
"errors": [{
"code": 401,
"id": "osdu:master-data--Well:999907686759",
"message": "Legal response 401 {\"code\":401,\"reason\":\"Unauthorized\",\"message\":\"The user is not authorized to perform this action\"}",
"reason": "Error from compliance service"
}
],
"id": "osdu:master-data--Well:999907686759"
}
]
}, contentType = application / json, responseCode = 200, exception = null, request = http: //localhost:8181/v1/data/osdu/partition/osdu/dataauthz/records, httpMethod=POST, latency=812)
~~~
And will receive an empty response:
~~~
{
"records": [],
"notFound": [
"osdu:master-data--Well:999907686759"
],
"conversionStatuses": []
}
~~~
Which left records not indexed, and not searchable. Scenarios, when this occurrence happens, look quite easy to achieve, for example when the record uses ACLs that don't belong to the Service token.
**Solution:**
We need to bypass OPA\Policy authentication for internal service requests.M16 - Release 0.19Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRiabokon Stanislav(EPAM)[GCP]Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/154Storage service stale in-memory cache leads to inconsistency.2023-02-15T18:37:33ZNikhil Singh[MicroSoft]Storage service stale in-memory cache leads to inconsistency.We recently uncovered a bug in storage service due to local cache getting stale. The flow can be understood by the following steps.
1. Deletion of a legal tag via legal service delete API --> response 204 No content after successful del...We recently uncovered a bug in storage service due to local cache getting stale. The flow can be understood by the following steps.
1. Deletion of a legal tag via legal service delete API --> response 204 No content after successful deletion
2. Storage service API call made at https://**********/api/storage/v2/push-handlers/legaltag-changed?token=*** --> Goes to a pod P1 of storage service --> Updates the records compliance for all the record associated with the deleted tag in step 1---> Removes the deleted tag from local cache of pod P1.
3. Storage PUT call to create a record with the deleted legal tag--> goes to a pod P2 of storage--> the cache still has that legal tag-->returns 201 created.
At step 3, all calls going to pod p1 returns "Invalid legal tag" but API calls landing on other pods successfully create these records.
The service ITs are failing in transient manner due to this issue.M17 - Release 0.20Nikhil Singh[MicroSoft]Nikhil Singh[MicroSoft]https://community.opengroup.org/osdu/platform/system/storage/-/issues/155GCP failing with core-common v0.18.0-rc42023-01-02T11:18:05ZMina OtgonboldGCP failing with core-common v0.18.0-rc4osdu-gcp-anthos-test integration tests are consistently failing when the core-common version is upgraded to v0.18.0-rc4.
Currently, gcp consumes 0.17.0 version of core-common which contains vulnerable libraries. The storage MR "Update ...osdu-gcp-anthos-test integration tests are consistently failing when the core-common version is upgraded to v0.18.0-rc4.
Currently, gcp consumes 0.17.0 version of core-common which contains vulnerable libraries. The storage MR "Update Storage to be Collaboration Context Aware" needs to consume a new version of core-common that exposes collaboration context. It is a blocker for this storage MR to be merged. As a quick fix for gcp test failure, we created a core-common that has collaboration context off of 0.17.0 version of core-common. The pipeline is passing with this version, which indicates that the gcp test failure is coming from the core-common version upgrade from 0.17.0 to 0.18.0-rc4.
References
* [Associated storage MR](https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/546)
* [Core-common MR](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/merge_requests/183)
* [ADR for the storage and core-common MRs](https://community.opengroup.org/osdu/platform/system/storage/-/issues/149)Yauhen Shaliou [EPAM/GCP]Yauhen Shaliou [EPAM/GCP]https://community.opengroup.org/osdu/platform/system/storage/-/issues/156ADR: Recover a soft deleted record in storage2023-09-11T08:27:45ZAbhishek NandaADR: Recover a soft deleted record in storageAbility to recover a soft deleted record in storage service
# Decision Title
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The storage service provides 2 ways to delete a r...Ability to recover a soft deleted record in storage service
# Decision Title
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The storage service provides 2 ways to delete a record. One way is to logically delete the record in which the record with same id can be revived later because its version history is maintained and the other one is to purge the record in which case, the record's version history is deleted too. In both types of deletions, the record cannot be accessed using storage or search service.
Today there is no easy way to query or recover the soft-deleted records. Providing admin-only APIs will help admins to search, view and recover the soft-deleted data if required.
# Tradeoff Analysis - Input to decision
Today users have to maintain the soft deleted record IDs on their own. Below is the workaround available today to attempt recovery of such records
1. Recreate the record with existing id and random/empty data and meta blocks. This will mark the record as active.
2. Fetch all versions of the record.
3. Fetch the latest version prior to the one just created to get back the actual record data and meta blocks.
4. Recreate the record using the response to create a new version of the record with the appropriate data.
## Decision
Create 3 new APIs as below
1. Fetch deleted records (accessible to _users.datalake.admins_) -> This will fetch a list of records. Since the list can be very long it should return a maximum of 100 records and support a from and to deletion dates filter along with pagination.
![image](/uploads/ca34cf94f3184fba05d2ade6bb502a90/image.png)
2. Recover deleted records by id (accessible to _users.datalake.admins_) -> This will take a list of record ids (max 500) that are to be recovered and return the list of record ids that succeeded as well as failed.
![image](/uploads/ae448c5fb9ed5803101aeba51a4fd7b4/image.png)
3. Recover deleted records by metadata filters (Currently support for only fromDeletedDate and toDeletedDate) (accessible to _users.datalake.admins_) -> This will take filter criteria of records that are to be recovered and return the list of record ids that succeeded as well as failed.
![image](/uploads/2b1d373eed8513e166fba784be4b3250/image.png)
## Consequences
1. This will help users to bulk recover deleted records in a single go.
2. The APIs will help prevent having garbage record versions that had to be created just to make the record active.
3. This will help users to fetch a list of soft deleted records which was not possible earlier.
Open API spec for the service
[storage-recover-swagger.yaml](/uploads/396cc62881dfe5f075f0e987f0313472/storage-recover-swagger.yaml)https://community.opengroup.org/osdu/platform/system/storage/-/issues/157Storage Improperly local cached ORDC information from Legal service2023-05-30T08:55:39ZKelly ZhouStorage Improperly local cached ORDC information from Legal serviceCurrently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partit...Currently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partitions.
In order to fix that, we need to have data partition id information in the local cache for ORDC information.M16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/158AZURE: on reading version from storage we are checking only viewer permissions2023-01-16T12:02:28ZYauheni LesnikauAZURE: on reading version from storage we are checking only viewer permissionsOn reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.On reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/159Storage adds null meta to record ingested without2023-03-22T04:11:53ZAn NgoStorage adds null meta to record ingested without1. Record was ingested without specifying "meta" block. PUT api was successful.
2. Fetch the ingested record. Notice that Storage added "meta": null to the record.
**Checking with Search.**
Search indexed successfully. Status code was 2...1. Record was ingested without specifying "meta" block. PUT api was successful.
2. Fetch the ingested record. Notice that Storage added "meta": null to the record.
**Checking with Search.**
Search indexed successfully. Status code was 200.
Search result does not return the meta.
The current behavior is challenged saying that Meta block shouldn't have been added. Or if added, then it should be empty and not null.
So instead of adding:
"meta": null
It should be:
"meta": []
Upon creating or updating a record, providing an empty meta block should also be allowed.https://community.opengroup.org/osdu/platform/system/storage/-/issues/160ADR - Clean OpenAPI 3.0 Documentation using 'Code First Approach'2023-07-10T08:02:52ZOm Prakash GuptaADR - Clean OpenAPI 3.0 Documentation using 'Code First Approach'## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
While adopting **OpenAPI 3.0** standards using `springdoc`, we end up adding lot of documentation to native controller of each AP...## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
While adopting **OpenAPI 3.0** standards using `springdoc`, we end up adding lot of documentation to native controller of each API.
- API contract is not clearly visible
- reduces the readability of the API
- business logic & documentation at the same place
## Tradeoff Analysis
- To maintain clean API documentation
- API, Controller segregation
- adopt future changes w.r.t to documentation or contract change
## Proposed Solution:
- Introduce API, Controller Layer Segregation
- API will have contract, definitions & OpenAPI documentation
- Controller will implement the API contract with clean code
#References:
1. [‘Code First’ API Documentation](https://reflectoring.io/spring-boot-springdoc/)
## Sample Refactor in Storage Patch API
- [Patch API](https://community.opengroup.org/osdu/platform/system/storage/-/blob/az/td-codefirst/storage-core/src/main/java/org/opengroup/osdu/storage/api/PatchApi.java)
- [Patch Controller](https://community.opengroup.org/osdu/platform/system/storage/-/blob/az/td-codefirst/storage-core/src/main/java/org/opengroup/osdu/storage/api/PatchController.java)
## Sample Example code
Lets consider a TODO API with normal Crud operation
First we write Interface and define necessary annotations.
```
@RequestMapping("/api/todos")
@Tag(name = "Todo API", description = "euismod in pellentesque ...")
interface TodoApi {
@GetMapping
@ResponseStatus(code = HttpStatus.OK)
List<Todo> findAll();
@GetMapping("/{id}")
@ResponseStatus(code = HttpStatus.OK)
Todo findById(@PathVariable String id);
@PostMapping
@ResponseStatus(code = HttpStatus.CREATED)
Todo save(@RequestBody Todo todo);
@PutMapping("/{id}")
@ResponseStatus(code = HttpStatus.OK)
Todo update(@PathVariable String id, @RequestBody Todo todo);
@DeleteMapping("/{id}")
@ResponseStatus(code = HttpStatus.NO_CONTENT)
void delete(@PathVariable String id);
}
```
##
Then we derive existing controllers from interface for controller implementation
```
@RestController
class TodoController implements TodoApi {
// method implementations
}
```
## Consequences
- Requires changes across services and code refactoring.
- No Breaking functional changes.M17 - Release 0.20Chad LeongOm Prakash GuptaChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/161Storage should rollback ingestion when publishing event fails2023-03-09T18:18:37ZThiago SenadorStorage should rollback ingestion when publishing event failsWhen storage service succeeds in saving new records but fails in publishing the event we create an inconsistency in the system since the data are kept in storage but are not notified to search/indexer. In other words, we need to rollback...When storage service succeeds in saving new records but fails in publishing the event we create an inconsistency in the system since the data are kept in storage but are not notified to search/indexer. In other words, we need to rollback the write to storage in case of a failed publish. The fix is trivial: move [these blocks](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/service/PersistenceServiceImpl.java#L92) to [this block](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/service/PersistenceServiceImpl.java#L104).https://community.opengroup.org/osdu/platform/system/storage/-/issues/162Record ACL should be case insensitive2023-03-09T18:17:51ZAn NgoRecord ACL should be case insensitiveEntitlements group creation always lowercases the group name, regardless of the input.
Storage honors the ACL group name case sensitivity. This creates inconsistency for ACL validation.
**For example:**<br>
User creates a data group cal...Entitlements group creation always lowercases the group name, regardless of the input.
Storage honors the ACL group name case sensitivity. This creates inconsistency for ACL validation.
**For example:**<br>
User creates a data group called: data.SomeGroup.viewers<br>
Upon this request, Entitlements creates a group called: data.somegroup.viewers
Upon creating a record, the user enters data.SomeGroup.viewers as the ACL.<br>
If the user tries to fetch the record, a 403 is returned since Entitlements only sees group data.somegroup.viewers.
**Fix:**<br>
**For existing records (addressing the ghosted records):** Storage fetch record validation should lowercase the ACL group against the list of groups returned from Entitlements.<br>
**Long term solution:** The fix should be in the record creation. Storage PUT API should lowercase the ACL upon record creation. OR We could fail the PUT request if the ACL group has mixed case. Note that there is no ACL group existence validation upon record creation.https://community.opengroup.org/osdu/platform/system/storage/-/issues/163The request to get records of particular kind using the limit is not working.2023-06-20T05:07:07ZKamlesh TodaiThe request to get records of particular kind using the limit is not working.The Storage API CI/CD v1.11 (from Platform Validation project) was working on all the platforms and passing with 100% pass rate.
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_Stor...The Storage API CI/CD v1.11 (from Platform Validation project) was working on all the platforms and passing with 100% pass rate.
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
At present, it is still passing with 100% pass rate in AWS R3 M16 Platform Validation (forum testing environment)
But it is not passing with 100% pass rate in all other Platform Validation CSPs environments as well as
it is not passing with 100% pass rate in all CSPs environments in pre-ship
In the referenced collection Request #8 is failing.
The following request for STORAGE API is in question 08 - Storage - Get all records for a kind with limit of 10 records
=====================================================================
e.g. of passing in Platform Validaition R3 M16 (forum testing)
curl --location 'https://r3m16.forumtesting.osdu.aws/api/storage/v2/query/records?limit=10&kind=osdu%3Awks%3AautoTest_955280%3A1.1.0' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOi...4XnucQETfnB3biA' \
--header 'Cookie: session=eyJfZnJlc2giOmZhbHNlLCJfcGVybWFuZW50Ijp0cnVlfQ.Y_VNrw.SMJbZoZwlkMYCD7E9ge4ICPnqJY'
https://{{STORAGE_HOST}}/query/records?limit=10&kind={{authority}}:{{schemaSource}}:{{entityType}}:{{schemaVerMajor}}.{{schemaVerMinor}}.{{schemaVerPatch}}
The response code: 200 OK
{
"results": [
"osdu:999611481173:999301114394"
]
}
===================================================================
Example of when it is failing
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/records?limit=10&kind=osdu%3Awks%3AautoTest_20923%3A1.1.0' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOi...tW7kPscDabFJ3sEPeNA'
Response code: 415 Unsupported Media Type
Body of response is blank
It is same message for all the CSP where failure is happening
============================================================================
@chad @debasiscM16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/164For AWS platform query to get all kinds is not returning any records.2023-03-09T21:26:03ZKamlesh TodaiFor AWS platform query to get all kinds is not returning any records.The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json'...The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOiJ...7kPscDabFJ3sEPeNA'
The response 200 OK (with results being empty)
{
"results": []
}
The collection used can be found at https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
The request name is "01 Storage - Get all kinds success scenario"
@chad @debasiscM16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/165Need example of how to use the POST /query/records:batch Fetch multiple rec...2023-03-09T21:25:28ZKamlesh TodaiNeed example of how to use the POST /query/records:batch Fetch multiple recordsThe Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU ...The Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU account or customer's account) which the users choose to use with the Search API.
frame-of-reference: This value indicates whether normalization applies, should be either 'none' or 'units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;'
@chad @debasiscM17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/166Need example of how to use the POST /query/records:batch Fetch multiple rec...2023-04-20T03:00:55ZKamlesh TodaiNeed example of how to use the POST /query/records:batch Fetch multiple recordsThe Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU ...The Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU account or customer's account) which the users choose to use with the Search API.
frame-of-reference: This value indicates whether normalization applies, should be either 'none' or 'units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;'
@chad @debasiscM17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/168Storage should allow empty data block upon record creation/update2023-03-22T04:13:47ZAn NgoStorage should allow empty data block upon record creation/updateStorage PUT api should allow empty data block upon record creation/update if that is compliant with the schema being defined.
Currently, data block is required.
data: {}
This is a breaking change since it changes the behavior of the ...Storage PUT api should allow empty data block upon record creation/update if that is compliant with the schema being defined.
Currently, data block is required.
data: {}
This is a breaking change since it changes the behavior of the API.
Indexer service needs to be checked to ensure empty data block is being handled correctly.https://community.opengroup.org/osdu/platform/system/storage/-/issues/169ADR: API to purge a batch of storage records2023-05-02T12:16:58ZMandar KulkarniADR: API to purge a batch of storage recordsNew API in Storage service to purge a batch of records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The OSDU Storage service provides 2 ways to delete a record. One way is ...New API in Storage service to purge a batch of records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The OSDU Storage service provides 2 ways to delete a record. One way is to logically delete the record in which the record with same id can be revived later because its version history is maintained.The other way is to permanently delete the record (called as purging) in which case, the record's version history is deleted too. This operation cannot be undone meaning records purged cannot be revived.
In both types of deletions, the record content cannot be accessed using storage or search service.
The storage service provides separate APIs for logical deletion (`POST /records/{id}:delete`) and purging of records (`DELETE /records/{id}`).
The storage service provides API for logical deletion of batch of records (`POST /records/delete`), but such an API is not available for purging of records.
The proposal is to provide an API on storage service to support purging of batch of records, where the maximum batch of size 500 will be supported.
Only the record IDs passed in the request body will be deleted not including any linked records or files if they exist. Cleaning up of all the linked records, such as child records, records in relationship block, and actual data (files ingested via workflow service), would not be in the scope of this API. It would be the user's responsibility.
The new bulk API will work on active as well as non-active (soft-deleted) records, similar to the existing purge API.
Purging of records can be performed by the owner of the records and the owner should be part of users.datalake.admins group.
The API response would be similar to the response of the logical deletion API that is `POST /records/{id}:delete`
In case of partial success, the response code would be 207 and the not-deleted-record-IDs would be listed in the response.
## Tradeoff Analysis
In the absence of an API to purge a batch of records, users would have to call the DELETE API once for every record and it would increase the number of calls to the storage service.
## Decision
Provide an admin-only API to purge a batch of records, with maximum batch size of 500 records.
The Open API specs for storage service with new API is here:
[storage_openapi_batchpurge.yaml](/uploads/1da3f68253419edd693a87d706049565/storage_openapi_batchpurge.yaml)
## Consequences
- New API on Storage service would be available.
- Documentation of Storage service should be modified with details for the new API.https://community.opengroup.org/osdu/platform/system/storage/-/issues/170Invalidate derived data when parent record is deleted2023-03-31T10:02:02ZAn NgoInvalidate derived data when parent record is deletedDerived data (records with ancestry/parent) inherit the legal tags from the parent record(s).
So when at least one of the parent records is deleted, then the children records are no longer valid. Without this step, there are records wit...Derived data (records with ancestry/parent) inherit the legal tags from the parent record(s).
So when at least one of the parent records is deleted, then the children records are no longer valid. Without this step, there are records with invalid legal tags (or no legal tag) still exists in the system.https://community.opengroup.org/osdu/platform/system/storage/-/issues/171Metadata only updates (via PATCH api) creates a mismatch in modifyUser and mo...2023-07-05T09:50:37ZAlok JoshiMetadata only updates (via PATCH api) creates a mismatch in modifyUser and modifyTime fields between record metadata and record data[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUse...[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUser fields for metadata and data objects respectively.
Repro steps:
- Create a storage record
- Modify the metadata ACL with PATCH api
- Retrieve the record with Storage records:batch api or getRecord api
- modifyTime and modifyUser fields are not returned.
OR
- Create a storage record
- Update the same record with PUT api
- Modify the metadata ACL with PATCH api
- Retrieve the record
- modifyTime and modifyUser are returned but not correct
Expected: From a user's perspective, when they update a record (either metadata or data or both), they should get back modifyUser and modifyTime values appropriatelyhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/172Metadata update API succeeds on remove operation on a `tag` if the tag doesn'...2023-05-25T10:36:21ZAlok JoshiMetadata update API succeeds on remove operation on a `tag` if the tag doesn't existSteps to reproduce:
- Create a record with some tags
- Try to update the record metadata via [metadata update API](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#metadata-updat...Steps to reproduce:
- Create a record with some tags
- Try to update the record metadata via [metadata update API](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#metadata-update-api) by removing a non-existing tag
```
curl --request PATCH \
--url '/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'Data-Partition-Id: common'
--data-raw ‘{
"query": {
"ids": [
"tenant1:type:unique-identifier:version"
]
},
"ops": [
{
"op":"remove",
"path":"/tags",
"value":[
"tagthatdoesntexist"
]
}
]
}
```
This should return 4xx, but returns 2xxhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/173Does not detect mismatch of entity name between "kind" and "id"2023-06-06T00:42:59ZDebasis ChatterjeeDoes not detect mismatch of entity name between "kind" and "id"I made this test case to create record directly by using Storage service and then the same record by using Manifest-based Ingestion.
```
"kind": "osdu:wks:work-product-component--TubularComponent:1.0.0",
"id": "osdu:work...I made this test case to create record directly by using Storage service and then the same record by using Manifest-based Ingestion.
```
"kind": "osdu:wks:work-product-component--TubularComponent:1.0.0",
"id": "osdu:work-product-component--TubularAssembly:TUBULARDC31May",
```
As you can see "kind" speaks of **TubularComponent** whereas "id" speaks of **TubularAssembly**.
Storage service seems very forgiving. It creates the record and also Indexer replicates the record in Index store. So, we can also retrieve by using Search service.
Whereas Manifest-based Ingestion rejects this JSON payload wit suitable reason, as expected.
See this document.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-AWS-Storage-service-test-sanity.docxhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/174Data authorization issue for Update/Patch operation2024-01-29T19:22:30ZDadong ZhouData authorization issue for Update/Patch operationWhen the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header inf...When the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header info are not included in the request. So the user will be able to update/patch a data record (based on the new ACLS/LegalTags) when the user should have no permission to update/patch (based on the existing record ACLS/LegalTags).
cc @hmarkovic @chad @hutchins @MonicaJohnsM22 - Release 0.25Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/175Storage service triggered more than 1 time while ingesting 1 single record.2023-06-09T01:27:02ZBruce JinStorage service triggered more than 1 time while ingesting 1 single record.Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within...Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within file `common-python-sdk/osdu_api/utils/request.py`. We need to include more acceptable status codes to avoid time wasting.https://community.opengroup.org/osdu/platform/system/storage/-/issues/176Storage x-collaboration header bug2023-09-26T14:21:44ZShane HutchinsStorage x-collaboration header bugFound this issue in /api/storage/v2/query/records, /api/storage/v2/query/records:batch
Received a response with 5xx status code: 500
Run this curl command to reproduce this failure:
curl -X GET -H 'Authorization: Bearer TOKEN' -H ...Found this issue in /api/storage/v2/query/records, /api/storage/v2/query/records:batch
Received a response with 5xx status code: 500
Run this curl command to reproduce this failure:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: ^À' 'https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/query/records?kind='
curl -X POST -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: ^À' -d '[]' https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/records/delete
PUT /api/storage/v2/records
curl -X PUT -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: €' -d '[]' https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/records
Azure PUT /api/storage/v2/records:
curl -X PUT -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: opendes' -H 'x-collaboration: €' -d '[]' https://osdu-ship.msft-osdu-test.org/api/storage/v2/records
Confirmed this bug in AWS and Azure.https://community.opengroup.org/osdu/platform/system/storage/-/issues/177Integration test coverage for users.data.root2023-07-20T11:05:00ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comIntegration test coverage for users.data.rootChanges to data authentication were recently introduced with the merge request: https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/694. However, we currently lack integration test cases to cover these modificat...Changes to data authentication were recently introduced with the merge request: https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/694. However, we currently lack integration test cases to cover these modifications.
It is essential to ensure that these changes won't disrupt the current flow and that `users.data.root` will consistently have access to ingested data.
To address this, we need to implement integration test cases to cover the new data authentication mechanisms.M20 - Release 0.23Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/178ADR: CosmosDb saturation/throttling when records reach too many versions2024-03-25T06:43:30ZAlok JoshiADR: CosmosDb saturation/throttling when records reach too many versions## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
***ISSUE***: Storage service stability issues due to too many versions of records.
***User behavior that causes this issue***: ...## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
***ISSUE***: Storage service stability issues due to too many versions of records.
***User behavior that causes this issue***: Creating a lot of versions for the same record ID. When multiple applications/teams do this long enough, we have too many versions for many records. There are no checks in place to prevent this scenario. We eventually hit infrastructure limits (i.e. CosmosDb document max size 2MB) but observe service instability much before.
***Why is this a problem***: Record versions are stored as part of record metadata. This is part of the `gcsVersionPaths` array. Each version is a string that represents the full path to the version's blob location. Record metadata is stored in CosmosDb. While CosmosDb has a hard size limit (2MB) for each document, this size is already too big when RU usage is considered. If we have hundreds or thousands of such records being updated, the total RU consumed is very high, incurring huge costs. This scenario poorly impacts service latency and availability. While not ideal, it is quite possible for applications to create versions of the same record for their workflows.
![image](/uploads/3f53fa471e7566a04d69ea539712db76/image.png)
For reference, here are some preliminary observations on the number of versions, size of the document and RU consumed to perform an UPSERT on a ***single*** document (note that the number of versions is not an ***absolute*** indicator to say how much RU will be consumed in performing an UPSERT, because its the size of the document that matters, and each version string can be of different length. One can fit a lot more versions if each version's length is small. However, as we stand today, it is the only metadata property that is causing documents to be big).
~1500 versions, ~300 RU consumed, ~243kb file size
~1500 versions, ~370 RU consumed, ~300kb file size
~3800 versions, ~1250 RU consumed, ~750kb file size
~5300 versions, ~1253 RU consumed, ~880kb file size
~9850 versions, ~2502 RU consumed, ~1.3mb file size
It is quite easy to have a few hundred or thousand records cripple the system once the records reach certain number of versions.
***CLARIFICATION***: The issue we observed is more specific to the Azure use case. Infrastructure limitations (i.e. cost to access a large document, hard limit on the size of the document) may vary per CSP (i.e. 2MB for CosmosDb, 1MB for GCP datastore). Other CSPs may see this issue once the number of versions reaches a certain number.
## Tradeoff Analysis
It is clear we want to limit the number of record versions. We see 2 ways to achieve this.
1. ***Set a hard limit*** on the number of versions on each record (say 1000) (preferred approach).
- Pros: Easy to implement, no behind-the-scenes magic.
- Cons: Breaking change for the existing workflows, when their records already have more than 1000 versions. Needs advance notice of breaking change and time for teams to update the workflows.
We can roll this out by first introducing a `deleteVersion` API in Storage that would give users time to delete older versions by themselves before breaking change is introduced so they don't break immediately.
2. ***Only keep 1000 recent versions***. For new records, this would mean actively start deleting the oldest version once we reach 1000 versions. For existing records with more than 1000 versions, this would mean cleaning up all older versions.
- Pros: Older versions are cleaned up for users automatically.
- Cons: Still a breaking change as older versions would get deleted automatically. Involves behind-the-scenes cleanup of older versions. For records that currently have more than 1000 records, this includes all remaining versions. There can be failure scenarios with cleanup and performance implications.
## Consequences
Storage will introduce a limit on the number of versions a record can have. Depending on the solution we choose, API will either fails after n versions (hard limit) OR older versions will get deleted automatically.M23 - Release 0.26Alok JoshiChad LeongThulasi Dass SubramanianOm Prakash GuptaAlok Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/179Storage batch API returns 404 for unauthorized records2024-03-07T13:08:37ZAn NgoStorage batch API returns 404 for unauthorized records**Use-case:** Reindex Kind API is called.
Noted in the logs there were 404s returned.
Record Fetch on some of the impacted records, 403s were returned.
Investigation shows Batch Record fetch returned 404s instead.
Issue identified f...**Use-case:** Reindex Kind API is called.
Noted in the logs there were 404s returned.
Record Fetch on some of the impacted records, 403s were returned.
Investigation shows Batch Record fetch returned 404s instead.
Issue identified from this workflow:
- Storage batch API responds unauthorized records (403) as not found (404)
### ADR: Storage batch API responds unauthorized records (403) as not found (404)
#### Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
#### Context & Scope
The current behavior of Storage batch API: if a record is not authorized, it is put in the _notFound_ field of the response body along with other not found records. The response body in this case looks like this:
```
{
"records": [],
"notFound": [
"opendes:facet:unauthorizedrecord1",
"opendes:facet:unauthorizedrecord2",
//other not found records...
],
"conversionStatuses": []
}
```
#### Solution
To fix this behavior of the Storage batch API we can introduce a new field to the response body. The proposed solution is to add a new field (_unauthorized_) to the response body, so we can distinguish between unauthorized and actual not found records. Sample response body:
```
{
"records": [],
"notFound": [
//not found records...
],
"unauthorized": [
"opendes:facet:unauthorizedrecord1",
"opendes:facet:unauthorizedrecord2"
],
"conversionStatuses": []
}
```
#### Сonsequence
This solution is a breaking change as it implies changing API contract. It will include a change in the core library, a change in Storage, and then a change in the Indexer service to handle batch API response.Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/180Unable to nullify a non-system attribute from DateTime value to null or empty...2023-08-22T10:20:49ZShubhankar SrivastavaUnable to nullify a non-system attribute from DateTime value to null or empty value using Storage serviceTo support a business use case, a user would need to update an existing attribute (with data type as date time) residing under data { } section of a **work-product-component** schema from a valid DateTime value (e.g.- 2023-08-10T00:00:00...To support a business use case, a user would need to update an existing attribute (with data type as date time) residing under data { } section of a **work-product-component** schema from a valid DateTime value (e.g.- 2023-08-10T00:00:00+0000) to "null" or an empty string (""). But when this transaction is attempted and executed via STORAGE service, the value of the attribute remains unchanged even after a successful execution (HTTP status code - 200). STORAGE service should allow users to register an empty/null value for DateTime attribute.
Please note that the attribute "DateSubmitted" does not belong to the list of System Properties like "createTime" or "modifyTime" and might not be used for auditing purposes.
1. "kind": "shell:wks:work-product-component--LQCWebSheet:1.0.0"
2. Example record:
{
"data": {
"ApprovalStatusTypeID": "osdu:reference-data--LQCApprovalStatusType:Submitted:",
"Source": "shell",
"Name": null,
"IsBonus": false,
"LoggingInterpreter": null,
"FinalDeliveryDuration": 1.0,
"WebSheetName": "Test_LWD_Websheet_Edit_Approver_Request_v2",
"LastUpdatedPPEmail": null,
"ApproverEmail": "NewApprover1.Nayak@shell.com",
"WellboreID": "osdu:master-data--Wellbore:BDLQCGOM2_1_WB2:", "
"OperationalComment": "Test_LWD_Websheet_Edit_Approver_v2_Operational_Comments",
"ApproverComment": null,
"SourceApplication": "Created in LQC WebSheets",
"SubmitterName": "Sujith.Submitter@shell.com",
"IsApprovalStatusReset": true,
"DateSubmitted": "2023-06-05T07:56:19.914485+0000"
},
"kind": "shell:wks:work-product-component--LQCWebSheet:1.0.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.shell.com"
],
"owners": [
"data.default.owners@osdu.shell.com"
]
},
"type": "work-product-component--LQCWebSheet",
"version": 1686283555925808,
"tags": {
"normalizedKind": "shell:wks:work-product-component--LQCWebSheet:1"
},
"modifyUser": "Monalisa.Mohapatra@shell.com",
"modifyTime": "2023-06-09T04:05:56.083Z",
"createTime": "2022-12-15T11:26:58.940Z",
"authority": "shell",
"namespace": "shell:wks",
"legal": {
"legaltags": [
"osdu-shell-lqc-dataset-testing"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "Labanyendu.Nayak@shell.com",
"id": "osdu:work-product-component--LQCWebSheet:62008"
}
3. Target attribute - "data.DateSubmitted"https://community.opengroup.org/osdu/platform/system/storage/-/issues/181GET: /records/{recordID}/{version} - ERROR 5002024-01-01T08:47:32ZSiarhei Khaletski (EPAM)GET: /records/{recordID}/{version} - ERROR 500**Context**
GET: /records/{recordID}/{version} fails with error 500 if an invalid version is provided (see the attachment)
We noticed an odd behavior of the service:
List of existing versions of the following record: `opendes:work-pro...**Context**
GET: /records/{recordID}/{version} fails with error 500 if an invalid version is provided (see the attachment)
We noticed an odd behavior of the service:
List of existing versions of the following record: `opendes:work-product-component--SamplesAnalysis:e9f02f48f43149a8b69606ff7597f391`
![image](/uploads/3d75fd80a57f5558c7d0eb00a4d795eb/image.png)
If request unexisting version `1` - status error 500
![image](/uploads/d3dc228f70263bd24ff7d09975baa63c/image.png)
Meanwhile, if request unexisting version `1234` - status 404
![image](/uploads/e82da89c3673b643aaa26845f0eb0c81/image.png)
**Azure GLab Logs**
![image](/uploads/8d54b1addcbc1835b4ea3c90135072b6/image.png)
**Expected Behavior**
404 - status codeM22 - Release 0.25Siarhei Khaletski (EPAM)Chad LeongSiarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/system/storage/-/issues/182Issues observed with logging2023-12-01T06:47:32ZLarissa PereiraIssues observed with logging**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading fr...**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading from BlobStore for operation READ_FROM_STORAGE_CONTAINER although these logs belonged to separate operations.
![image](/uploads/afc539574de597bba300b5d6b2a18b8a/image.png)
**Issue 2: Multiple dependency logs and missing Read log**
We observed multiple dependency logs with identical operation Id's for the POST QueryApi/fetchRecords. These entries were observed when querying CosmosStore, however the READ_FROM_STORAGE_CONTAINER dependency log is missing.
![image](/uploads/ce377f8bf6ee95646ca1ab5d910df167/image.png)M22 - Release 0.25VidyaDharani LokamVidyaDharani Lokamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/183Add a note on deleted records to /versions2023-09-06T13:20:14ZMarton NagyAdd a note on deleted records to /versions**GET /records/versions/{id}** "Get all record versions" endpoint in [Storage Service](https://p4d.developer.delfi.cloud.slb-ds.com/workspace/apiCatalog/OSDU-Storage-Service) seems to retrieve record versions regardless of the record its...**GET /records/versions/{id}** "Get all record versions" endpoint in [Storage Service](https://p4d.developer.delfi.cloud.slb-ds.com/workspace/apiCatalog/OSDU-Storage-Service) seems to retrieve record versions regardless of the record itself **being (soft) deleted** or not. While neither **GET /records/{id}** "Get record" or **GET /records/{id}/{version}** "Get record version" retrieves the record when it's (soft) deleted... doing so correctly.
Please add a note to the **GET /records/versions/{id}** endpoint description to highlight the difference.
cc @nthakur, @gehrmannhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/184Storage Record query does not include record audit info2023-12-06T15:31:26ZAn NgoStorage Record query does not include record audit infoStorage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.Storage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.https://community.opengroup.org/osdu/platform/system/storage/-/issues/185ADR: API to retrieve past events of storage records2023-10-11T16:28:52ZYifan YeADR: API to retrieve past events of storage recordsNew API in Storage service to rehydrate past creation and last modified events for a given kind within the given time range.
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
The OSDU Sto...New API in Storage service to rehydrate past creation and last modified events for a given kind within the given time range.
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
The OSDU Storage service does not provide a way to retrieve past events of the records been created/modified. Many OSDU applications would be interested in retrieving the past events of the records that happened before the application subscribed to the notification service. This new API proposed in the ADR will provide the concerned applications with a way to backtrack the events.
The proposal is to provide an API on storage service to support retrieving past events of records of a kind that happened in the given time range, where the events will be returned in a paginated format and ascending chronological order based on the timestamp.
The new API will retrieve the first and the last events of the record, filter the events by the start date and end date provided by the user, and then return the filtered events.
## Tradeoff Analysis
The new API does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
## Decision
Provide an API to query past events of records of the given kind and return the events in paginated ascending chronological order.
{
“id”: \<RECORD_ID\>
“kind”: \<KIND\>
“op”: \<CREATE|UPDATE|DELETE, etc\>
"version": \<VERSION\>
"timestamp": \<TIMESTAMP\>\
}
## Consequences
* A new API on the Storage service would be available.
* Documentation of the Storage service should be modified with details for the new API.Yifan YeYifan Yehttps://community.opengroup.org/osdu/platform/system/storage/-/issues/186ADR: Replay2024-03-05T10:59:17ZAkshat JoshiADR: Replay<a name="ppadhi"></a>OSDU - Replay and Replay API
# Table of Contents
[Context ](#_toc119676063)
[Problems with Current Reindex All Solution ](#_toc119676075)
[Replay ](#_toc119676076)
[Requirements to address ](#_toc119676077)
[Arc...<a name="ppadhi"></a>OSDU - Replay and Replay API
# Table of Contents
[Context ](#_toc119676063)
[Problems with Current Reindex All Solution ](#_toc119676075)
[Replay ](#_toc119676076)
[Requirements to address ](#_toc119676077)
[Architectural Options ](#_toc119676078)
[Decision ](#_toc119676079)
[Replay API](#_toc119676080)
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## <a name="_toc119676063"></a>Context
This ADR is centered around the design of the new replay flow within OSDU's storage service. The purpose of this Replay flow is to publish messages that indicate changes to records, which are subsequently received and processed by consumers. It's important to note that the handling of these messages follows an idempotent process.
The Replay flow will address following-
1. In case of disaster, this replay flow will help us to rebuild the indexes that to RPO.[Out of Scope of ADR]
2. Reindexing the records by publishing the record change messages to consumer Indexer service.
3. Correction of indices after changes to structure of the storage records of a particular kind.
**Replay rate** - It is the rate at which storage publish the record changes message to service bus.
## <a name="_toc119676075"></a>Problems with Current Reindex All Solution
|**Problem**|**Details**|**What is Required?**|
| :- | :- | :- |
|Reliability |<p>**Operation is Synchronous.**</p><p>- Very long HTTP call is never reliable</p><p></p><p></p><p>The Reindex is a synchronous operation, making the operation Unreliable and not resilient to failures. If there is any interruption to the connection, all the status and progress could be lost.</p><p></p>|The operation must be reliable. If the operation is triggered, it must either succeed or it must fail and in both the cases, the user must be diligently informed with the right reasons for success/failures. The system should not be in a state where the user has no clue what’s happening.|
|Resiliency|Abrupt disturbance of the reindex-process leaves the system in an inconsistent state. For example, if there is any exception or if the process crashes, then the system is left entirely in an inconsistent state.|The system must be resilient to failure and must always succeed. If the operation fails, then the system must be left in the previous state.|
|Scale|Due to the synchronous and non-resilient nature of the current implementation, the scale is very limited. It cannot ingest more than a couple of million records reliably.|The reindex operation must scale to any number of records|
|Speed|The speed is very slow. It’s known to take close to an hour for 1 million records.|Faster rate of reindexing is required. For example, 100 million records should not take more than a few hours. |
|Tracking/Observability|There is no way for the user to know about the progress.||
|Pausing/Resuming reindex|Today, there is no capability to pause and resume reindex. Given that this will be a long running operation, having pause and resume will be good to have.||
|No Delta Reindexing|For some Disaster Recovery Scenarios, there may be partial backups available. So reindexing only a subset of records of a kind can prove to be useful. This functionality is not available today.||
|Parallelization|Currently, the reindex is a procedural process. This has impact on both scale as well as speed.||
## <a name="_toc119676077"></a>Requirements to address
To be able to address these issues, we need to re-design the way reindex works, addressing various functional and non-functional aspects like speed, scale, reliability, observability, etc. The below table outlines what is expected out of the new Reindex design.
|**Requirement**|**Details**|**Technical Implications** | **Scope** |
| :- | :- | :- | :- |
|1. Scalability|<p>The Replay operation must be scalable; it should be able to handle infinitely large amounts of records.</p><p><br>A realistic goal to target can be 100m records in 4-5 hours.</p>|<p>Need to ensure Elasticsearch storage can be scaled up.</p><p></p><p>For achieving a higher scale, the following must be done: -</p><p>- The whole operation must be **Asynchronous** in nature</p><p>- It must be resilient to failures due to pod crashes, 429s due to high Database/Service Bus/Elasticsearch load.</p><p>- We can leverage Message Broker to divide and conquer and have the framework.</p><p>- We can also look at job schedulers like QUARTZ to achieve a reliable reindex.</p><p>- Need to evaluate which is the best service to perform this reindexing. </p><p>- Can also try to leverage **Airflow**</p><p></p>| In Scope of ADR |
|2. Reliable Responses|<p>When the operation is triggered, the response must be reliable. </p><p></p><p>There could be some pre-validation done to check whether the reindex process can be completed either successfully or not.</p><p>The result of whether the operation is success or fail, should be communicated via response to the user properly.</p>|Today, we don’t return anything apart from 200 OK in the response even if things fail. <br><br>The entire response should be revamped and reworked on how the status can be conveyed to the user in a useful way.| In Scope of ADR |
|3. Observability and Monitoring|<p>Given the fact that reindex is a long running operation, the User triggering the reindex must have insights into what is going on, using a track status API.</p><p></p><p>Some of the details should include:</p><p>- **Status:** Validating, Stopping-Ingestion, In-progress, Finalizing, Complete, Error, etc.</p><p>- **Progress:** Overall percentage, per index progress, remaining records count, ETA</p><p></p>|We could store the progress in a Redis cache or elsewhere that can be used to report back to the user on the progress.| In Scope of ADR |
|4. Reliable System State – Consistency before/after operation in case of failure|<p>Guarantee to reindex valid storage records – **Must have**</p><p><br>**(depends on message broker reliability)**</p><p></p><p><br>**Rollbacks** – nice to have</p>|<p></p><p>If there are unrecoverable errors during reindexing a particular kind, then that leaves the system in an inconsistent state. It would be good to “**rollback**” the operation to restore the system to the state before the operation was triggered for that kind.</p><p></p><p>There should also be **no concurrent “reindexAll” operation** running. There can however, be concurrent reindex of different kinds happening at the same time.</p><p>It can be a configurable parameter on whether the rollback should be done in case of unrecoverable failures, due to internal system errors.<br><br>How this can be achieved is that, all the reindexed records for a kind, should be indexed into a new “secondary index” for that kind, and only if that is succeeds completely, the index can be renamed and replace the primary index.<br><br>Elasticsearch’s clone index feature can be utilized to achieve this.</p><p></p><p>- Reindex failed record IDs</p>|Out of Scope of ADR |
|5. Stop Ingestion/Search during Reindex|<p>During **Reindex**, the normal ingestion should stop. This is because:</p><p>- There are some edge cases which could end up the system in an inconsistent state. Edge Cases: **<TODO>**</p><p>- Load on Elasticsearch</p><p></p>| | Out of Scope of ADR|
|6. Speed</p>|<p></p><p>The operation is quite slow today. It takes almost an hour to reindex a million records. This means it will take a few days to reindex 100m records, which is not practical.</p><p></p><p>Two Issues:</p><p>1. Finding Unique Kinds</p><p>2. Reindexing – Database load</p>|<p></p><p>This is **directly dependent on the scalability of the underlying infra like Database** and Elasticsearch. </p><p>Database can be scaled up/out on demand, by either the UI by customer (i.e., a via CP), or some other means. </p><p></p><p>Auto scaling-out of Elasticsearch is currently not possible, so we may be limited in speed due to Elasticsearch. We can, however, scale up Elasticsearch and this can help in higher speed.</p><p></p><p>How this scale up is triggered automatically or manually is something we need to evaluate and do a POC.</p><p></p><p>Storage Service’s queries can also be revisited – there was a change done in some service which had a more efficient implementation of paginated queries - [Performance improvement on paginated query for CosmosDB (!244) · Merge requests · Open Subsurface Data Universe Software / Platform / System / Lib / cloud / azure / OS Core Lib Azure · GitLab (opengroup.org)](https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/merge_requests/244/diffs)</p><p></p><p></p>| |Out Scope of ADR |
|7. **Delta Reindex** and **Consistency Checker/Enforcer**|<p>Doing a delta reindex can be useful if there is restoration of backups during a disaster recovery. This will result in faster recovery times.</p><p></p><p>Delta Reindex = reindex only those records that are not present in Backup.<br><br>When we talk about delta reindex, we need to ensure there is consistency across all 3 components – storage blob, storage records and Elasticsearch.</p><p></p>|<p>Need to explore feasibility. The operation can be something like Reindex All records whose create/update time > X.</p><p></p><p>A consistency enforcer should be built that will ensure that the 3 entities are in consistent state.</p>| Out Scope of ADR |
|8. Snapshot Backup/Cluster replication|<p>Backup Elasticsearch storage Snapshots frequently, and in case of disaster, restore the snapshot and then perform the delta reindex.<br><br>This will make the recovery times much faster| |Out Scope of ADR |
|9. Source of trigger|During a recovery process, who will make the call to reindex? Is it the user or internal system? |Will need to design and account for this fact in the reindex design.| Out Scope of ADR |
|10. Pause/Resume Reindex|Since reindex is a long running operation, having the ability to pause and resume reindex operation would be nice to have|<p>We need to ensure system consistency when the operation is paused and resumed. </p><p></p><p>Also, any new records ingested after the pause must be included in the reindex process when it’s resumed.</p><p></p>| Out Scope of ADR |
## <a name="_toc119676078"></a> Architectural Options:
<br>
|**Options**|**Pro**|**Cons**|**Work Required**|
| :- | :- | :- | :- |
|1. Using **Airflow** + Message Broker + StorageService + Workflow Service|<p>- Proven Workflow Engine</p><p>- Lesser new implementations in storage services, so lesser work required by other CSPs.</p>|<p>- Process becomes slower and inefficient.</p><p>- Lot of HTTP calls from Airflow <-> AKS</p><p>- Airflow will require access to internal Infrastructure to operate in the most efficient manner.</p><p>- Some required features are not yet available in ADF Airflow </p><p>- Parallelization may spawn up 1000s of tasks waiting to be scheduled. **Scalability can be issue.**</p><p>- Concurrency and Safety guarantee is tricky – allowing no more than one reindex for a kind</p><p></p>|<p>**Airflow**</p><p>- DAG using TaskGroups, Dynamic Task Mapping, Concurrency handling.</p><p>- Build pipelines to integrate new DAG.</p><p></p><p>**Storage Service**</p><p>- Implement new APIs to publish messages to message broker.</p><p></p><p>**Indexer Service**</p><p></p><p>**Workflow Service**</p><p>- Have new APIs to support observability</p><p>- Design for checkpointing</p>|
|2. Using **StorageService** + **Message Broker**|<p>- Simple, Lesser moving parts</p><p>- Fast & Efficient</p>|- Parallelization may require state management.|<p>**Storage Service**</p><p>- New APIs for exposing Replay functionality (ReplayAll, ReplayKind, GetReplayStatus)</p><p>- New Modules for replay message processing</p><p></p><p>**Indexer Service**</p><p>- Delete ALL kinds API</p>|
## <a name="_toc119676079"></a> Decision:
We chose design option 2 using storage service and message broker as the advantage is to persist the replay status and allows to re-play and return the status and simpler implementation.
- **[Decision]** What led us to select the Storage service for the Replay API decision? <br>
* The source of truth for the storage records is – Storage Service. It is the storage service, that publishes the record change messages, which are then consumed by the consumers. This processing of those messages is idempotent.So, it’s fair to say that to trigger reindexing, we must invoke some procedure in storage service, that will make it emit record change messages onto the message broker.<br>
* Indexer is just a consumer of the recordChange messages, and there could be other consumers who require this replay functionality as well. In those cases, instead of letting each consumer build their own replay logic, if we have it in one common place, it would benefit all the consumers. <br>
* This way, one consumer doesn’t have to depend on indexer, which is also just another consumer<br>
* Reindex is just one-use cases, that uses this new Replay functionality. Other consumers can have their own use case for consuming those replayed messages.
<br>
**Design Approach for option 2:**
![Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004](/uploads/5a573b82493315f91adeee547fd97fee/Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004.png)
**Note**
The ADR also helps to address following issues - <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/91 <br>
* The Replay flow will include a Service Bus topic for every event. If we need to introduce new events in the future that necessitate message publishing, we can easily do so by introducing a new topic and associated logic. This approach can help prevent unintended consequences that may arise from triggering other listeners on the same topic, as they can be resolved accordingly. <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/66
* Utilizing the service bus and tracking its progress assists us in achieving a reliable design, including the built-in reliability of message queuing. <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/80
* With the flexibility to introduce new topics in the ReindexAkshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/187ADR: Replay API2024-02-29T14:40:15ZAkshat JoshiADR: Replay APITwo New API in Storage service as part of Replay flow will be introduce in the storage service.
* [] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
This ADR is centered around the design of ...Two New API in Storage service as part of Replay flow will be introduce in the storage service.
* [] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
This ADR is centered around the design of the new replay API within OSDU's storage service which is introduced as the part of the [Replay ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/186). The purpose of this Replay API is to publish messages that indicate changes to records, which are subsequently received and processed by consumers. It's important to note that the handling of these messages follows an idempotent process.
## Terminology
<table>
<tr>
<td><strong> Name</strong>
</td>
<td><strong> Explanation</strong>
</td>
</tr>
<tr>
<td><strong> Record</strong>
</td>
<td>The record is stored in OSDU Data Platform in two parts, i.e., document database, which contains basic data (id, kind, legal information, and access permissions), and file storage in a Java Script Object Notation (JSON) format, which contains other relevant information of the record. We are interested in the document database part.
</td>
</tr>
</table>
## Tradeoff Analysis
The new APIs does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
## Additional Requirement
The newly introduced APIs must facilitate [Collaboration workflows](https://community.opengroup.org/osdu/platform/system/storage/-/issues/149) through the utilization of the x-collaboration header. Additionally, the replay mechanism should ensure the accurate publication of collaboration context information in the corresponding event.
## Decision
The proposal is to provide POST and GET Replay API -
The new APIs does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
<table>
<tr>
<td><strong> API fields </strong>
</td>
<td><strong>Explanation</strong>
</td>
</tr>
<tr>
<td><strong>kind</strong>
</td>
<td>It specifies to what Kind the schema belongs to. [optional]
</td>
</tr>
<tr>
<td><strong>repalyId</strong>
</td>
<td>It represents status id. [required]
</td>
</tr>
<tr>
<td><strong>operation</strong>
</td>
<td> Define the replay operation to be carried out. [required]
</td>
</tr>
<tr>
<td><strong>filter</strong>
</td>
<td> Define based on which field the record is selected. [optional]
</td>
</tr>
</table>
<strong>Allowed roles for API access</strong> : users.datalake.ops
<br>
<table>
<tr>
<td>
<strong>Method</strong>
</td>
<td>
<strong> API Endpoint</strong>
</td>
<td>
<strong>Design</strong>
</td>
</tr>
<tr>
<td> POST
</td>
<td>v1/replay
</td>
<td>
<strong>Request Example - </strong>
<p>
<strong> </strong>
<p>
1. <strong>Description</strong> - This API request will reindex all the storage records.
<p>
This phase we will pass empty body for reindexall
<p>
{
<p>
}
<p>
In next phase -
<p>
![operationrepaly](/uploads/d7679bf7d4d6d9745e0d9c579905fc74/operationrepaly.png)
<p>
2. <strong>Description</strong> - This API request will reindex the specific kinds of storage records in this operationName is optional by default, it will reindex specific kinds with filter field. Currently we will replay for single kind only so the array of kind will be restricted to size one.
<p>
![operationrepaly](/uploads/f06805a167d15986688ba23ac85ee897/operationrepaly.png)
<p>
<p>
<strong>Response example – </strong>
![responsepostreplay](/uploads/c557910f6369deda3971866bd2130864/responsepostreplay.png)
<p>
<strong>
</td>
</tr>
<tr>
<td> GET
</td>
<td>
replay/status/{id}</em>
<p>
</td>
<td>
<strong>Request:</strong>
<p>
<p>
<p>
1. <strong>Response Replay in Progress:</strong> <br>
<p>
a) <b>Scenario</b> - In Replay All <br><br>
![replaystatusAllKind](/uploads/12f155b5d491010f3ea37c2576e56e19/replaystatusAllKind.png) <br>
b) <b>Scenario</b> - In Replay single kind <br><br> ![replaystatusforsinglekind](/uploads/2043d80e2d350faa2f3fdb41d4601e0f/replaystatusforsinglekind.png)
<br>
<p>
<p>
2. <strong>Response Replay in Failed:</strong> <br>
<p>
a) <b>Scenario</b> - In Replay All <br><br>
![replayFailedForAllKind](/uploads/3d9a64803b229d3b46d4e283047d285e/replayFailedForAllKind.png)
<br>
b) <b>Scenario</b> - In Replay single kind <br><br>
![replayfailedforsinglekind](/uploads/407b53b19ddfa4545f52e9e88d34fb11/replayfailedforsinglekind.png)
<p>
<p>
</td>
</tr>
</table>
<br>
API spec swagger yaml -[ReplayAPISpecs.yaml](/uploads/f9e8ddd4958bf04f9bc99994ebdc4e41/ReplayAPISpecs.yaml)https://community.opengroup.org/osdu/platform/system/storage/-/issues/188Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declaration2024-01-18T16:01:20ZThomas Gehrmann [slb]Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declarationReported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in...Reported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in _propertyNames_ to the ID of the UOM in the Unit of Measure Reference list e.g. for a Wellbore record
```json
{
"kind": "Unit",
"name": "ft",
"persistableReference": "",
"propertyNames": [
"FacilitySpecifications[0].FacilitySpecificationQuantity",
"VerticalMeasurements[0].VerticalMeasurement"
],
"unitOfMeasureID": "osdu:reference-data--UnitOfMeasure:ft:"
}
```
The persistableReference attribute in meta[] is there to support storage of the full UoM Definition when unitOfMeasureID is not populated. E.g. for metres:
"persistableReference": "{\"abcd\":{\"a\":0.0,\"b\":1.0,\"c\":1.0,\"d\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"L\",\"type\":\"UM\"},\"type\":\"UAD\"}",
Populating persistableReference is no longer required if the UnitOfMeasure Reference List is now fully populated i.e. IDs exist for all used UoMs. This removes any need to populate persistableReference. Regardless, populating persistableReference is extremely onerous for a number of reasons:
- does not adhere to one version of the truth - UoM need only be defined in the UoM Reference List; storing UoM definition in persistableReference in all records is the most extreme opposite
- all ETLs would be required to populate all the meta[] UoM definitions for all record types - the UoM definition is maintained in every ETL
- all OSDU records unnecessarily bloated by carrying all this redundant, duplicate persistableReference metadata within Meta[] in each and every record when it is centrally stored in the Reference List. This impacts storage requirements for OSDU records.
Problem: The Normalizer for the Search API for numeric values does not support API > SI Search when JSON records do not have persistableReference populated. The only data needing to be populated is unitOfMeasureID, but this is ignored by the Normalizer and instead requires persistableReference to be populated.
Require: Normalized to be extended to support the unitOfMeasureID populated in Meta. When populated, any content, including blank content is ignored, the Normalizer instead retrieves the persistableReference content from the UnitOfMeasure Reference List (source of truth for UoM definitions).
---
Comment from @gehrmann - means the normalizer needs to be enhanced. From the schema side of things we have said that if `unitOfMeasureID` is populated it should supersede the `persistableReference` which is the future goal. The [AbstractMetaItem](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractMetaItem.1.0.0.json?ref_type=heads#L58) schema is historical and requires the `persistableReference` to be set. It should however be sufficient to set `"persistableReference": ""` when populating `unitOfMeasureID`.
Originally reported as [schema issue 624](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/issues/624)M22 - Release 0.25https://community.opengroup.org/osdu/platform/system/storage/-/issues/189[SAST] Vue_DOM_XSS in file index.html2023-11-15T10:54:25ZYauhen Shaliou [EPAM/GCP][SAST] Vue_DOM_XSS in file index.html**Description**
The method m-1"\> embeds untrusted data in generated output with href, at line 36 of \\storage\\provider\\storage-azure\\src\\main\\resources\\static\\index.html. This untrusted data is embedded into the output without p...**Description**
The method m-1"\> embeds untrusted data in generated output with href, at line 36 of \\storage\\provider\\storage-azure\\src\\main\\resources\\static\\index.html. This untrusted data is embedded into the output without proper sanitization or encoding, enabling an attacker to inject malicious code into the generated web-page.
# **Location:**
<table>
<tr>
<th> </th>
<th>Source</th>
<th>Destination</th>
</tr>
<tr>
<th>File</th>
<td>storage/provider/storage-azure/src/main/resources/static/index.html</td>
<td>storage/provider/storage-azure/src/main/resources/static/index.html</td>
</tr>
<tr>
<th>Line number</th>
<td>92</td>
<td>36</td>
</tr>
<tr>
<th>Object</th>
<td>pathname</td>
<td>href</td>
</tr>
<tr>
<th>Code line</th>
<td>return location.protocol + '//' + location.host + location.pathname</td>
<td>
\<a :href="signInUrl" class="btn btn-primary" v-if="!token" class="col-2"\>Login\</a\>
</td>
</tr>
</table>M21 - Release 0.24https://community.opengroup.org/osdu/platform/system/storage/-/issues/190ADR Consumer Topic Identification [ Replay Design ]2024-02-21T13:15:35ZAkshat JoshiADR Consumer Topic Identification [ Replay Design ]<h2>ADR Consumer Topic Identification</h2>
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
<h3>Problem Context</h3>
Today, the storage service publishes **RecordChange messages to “recordstopi...<h2>ADR Consumer Topic Identification</h2>
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
<h3>Problem Context</h3>
Today, the storage service publishes **RecordChange messages to “recordstopic”.**
When Storage Service publishes a **RecordChange** message to the “**recordstopic**” topic of the service bus, all the consumers get notified (eg. Indexer service, notification service). During scenarios like replaying for **reindex** scenarios, notifying all the consumers may not be required. Hence, we need a way to instruct storage service to publish **RecordChange** messages to a custom topic depending on the use case. For example, if the replay is going to be done for re-index, then we can instruct storage service to publish the **RecordChange** messages to a “reindex” topic which is being listened to by the indexer only, instead of publishing them to **recordstopic which has many consumers.** This will ensure that only the indexer service gets notified of the events.
Therefore, it is of utmost importance that the Producer's design allows for the appropriate routing of operations to their intended topics. This brings us to the question of how the Storage service can accurately determine the topic to which each message should be directed based on its specific functionality/operation . In response to this challenge, we have explored the following design, which will serve as the foundation for the development of our Replay API.
<table>
<tr>
<td><strong> Design Option </strong>
</td>
<td><strong> Detailed Approach </strong>
</td>
<td><strong> Pros/Cons</strong>
</td>
</tr>
<tr>
<td>
<ol>
<ul>1. <strong> Create different Topic for Each Operation and provide operation name i.e. reindex as input to the replay API</strong>.
<p><span style="color: green;"> [Preferred Approach] </span></p>
<ul>
</ol>
</td>
<td>There will be a separate topic for each operation.
<p>
For example, indexer service can listen to a topic called “reindex” and notification service can listen to the topic “notify” in addition to “recordstopic”.
<p>
The replay API will take the input as operation name i.e. reindex, based on that, it will decide which topic the replay API has to publish the recordChange message. This will ensure only the indexer gets notified.
<p>
![Picture1](/uploads/37e29ff02caff81f2f62ff3e98cabb74/Picture1.png)
<p>
<strong>Note</strong> – One operation will maps to one Topic in Service ( 1:1) . While a single topic can have multiple consumers.
</td>
<td><strong> Pros: </strong>
<ul>
<li>Abstraction and statelessness as users need not know about internal topics.
<li>Consistency as different CSP can decides on common operation name irrespective of internal implementation details.
<li>Decoupling of the internal implementation from Replay operation.
<p>
<strong>Cons: </strong>
<ul>
<li>Management of mapper which helps us to map the functionality i.e. reindex to topic name.
<li>Implementation will take time.
<li>Producers should know about consumer topic mapping. [ <strong>Remark</strong> – every Producer knows topic names either through registry or in memory store or environment variable mapper, currently we pass it as hardcoded value from deployment yaml to application properties]
</li>
</ul>
</li>
</ul>
</td>
</tr>
<tr>
<td>
<ol>
<ul>2. <strong>Create different Topic for Each Operation and provide Topic Name/ID as input to the replay API</strong>.
</ul>
</ol>
</td>
<td>There will be a separate topic for each operation.
<p>
For example, indexer service can listen to a topic called “reindex” and notification service can listen to the topic “notify” in addition to “recordstopic”.
<p>
If Replay is required for reindex scenario, then replay API can be called with the parameter topicId’s value as “reindex”.
<p>
This will cause storage services to publish recordChange messages to reindex topic instead of recordchangetopic. This will ensure only the indexer gets notified.
</td>
<td><strong>Pros - </strong>
<ul>
<li> No need to maintain the internal mapper if we use topic name.
<li>Implementation will be easy if we use the topic name directly.
<p>
<strong>Cons –</strong>
<ul>
<li> We must keep up with the mapper that allows us to associate topic IDs with their corresponding topic names when utilizing the topic ID as input for the reply to API in case we pass topic id.
<li> Users should have access to the internal topic details in case we use topic name.
<li> Since these APIs will be introduced at the community level, customizing them for specific topics, which may have different names or implementations by different CSPs, could impact uniformity.
</li>
</ul>
</li>
</ul>
</td>
</tr>
<tr>
<td>
<ol>
<ul>3. <strong> Create Different Topic for Each Consumer and let the specific consumer like indexer, call the replayAPI with the topicId</strong>
</ul>
</ol>
</td>
<td>In this, a new Reindex API in indexer will call the replay API and will provide the topic name along in the request body.
</td>
<td><strong>Pros – </strong>
<ul>
<li>Internal details like topic name need not be known to the user.
<li>The consumer can perform pre-requisite operations like deleting indices before calling the replay API.
<p>
<strong>Cons – </strong>
<ul>
<li> User has to use different APIs which will be bad experience.
<li>If we call it using reindex is that I must change the response payload to incorporate the status id and ADR in community and code merge.
</li>
</ul>
</li>
</ul>
</td>
</tr>
</table>
**Conclusion** - We are going with approach 1 taking into consideration the Pros.Akshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/191Add /liveness_check2024-01-08T10:07:15ZRiabokon Stanislav(EPAM)[GCP]Add /liveness_checkNeed to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.Need to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.M23 - Release 0.26Riabokon Stanislav(EPAM)[GCP]Riabokon Stanislav(EPAM)[GCP]https://community.opengroup.org/osdu/platform/system/storage/-/issues/192RAFSDDMS Unit conversion issue2023-12-05T13:46:05ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRAFSDDMS Unit conversion issueIt was observed that the record from the collection: https://community.opengroup.org/osdu/qa/-/tree/main/Dev/48_CICD_Setup_RAFSDDMSAPI?ref_type=heads
Requested with conversion headers:
```plaintext
curl --location 'https://community.g...It was observed that the record from the collection: https://community.opengroup.org/osdu/qa/-/tree/main/Dev/48_CICD_Setup_RAFSDDMSAPI?ref_type=heads
Requested with conversion headers:
```plaintext
curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/storage/v2/query/records:batch' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: osdu' \
--header 'accept: application/json' \
--header 'frame-of-reference: units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;' \
--header 'Authorization: Bearer ' \
--data '{
"records": [
"osdu:work-product-component--RockSampleAnalysis:Test"
]
}'
```
Causing internal server error:
```plaintext
Caused by: java.lang.NullPointerException: Cannot invoke "com.google.gson.JsonArray.size()" because "elementArray" is null
at org.opengroup.osdu.core.common.util.JsonUtils.overrideNestedNumberPropertyOfJsonObject(JsonUtils.java:219)
at org.opengroup.osdu.core.common.util.JsonUtils.overrideNumberPropertyOfJsonObject(JsonUtils.java:146)
at org.opengroup.osdu.core.common.crs.UnitConversionImpl.convertRecordToSIUnits(UnitConversionImpl.java:166)
at org.opengroup.osdu.core.common.crs.UnitConversionImpl.convertUnitsToSI(UnitConversionImpl.java:56)
at org.opengroup.osdu.storage.conversion.DpsConversionService.doConversion(DpsConversionService.java:80)
at org.opengroup.osdu.storage.service.BatchServiceImpl.fetchMultipleRecords(BatchServiceImpl.java:228)
at org.opengroup.osdu.storage.api.QueryApi.fetchRecords(QueryApi.java:135)
```
Further investigation is required to fix it.https://community.opengroup.org/osdu/platform/system/storage/-/issues/193How to troubleshoot? Field missed from Search response although we see it fro...2023-12-16T16:45:21ZDebasis ChatterjeeHow to troubleshoot? Field missed from Search response although we see it from Storage response.Companion issue in Preship site is here -
https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/649
This is not a case of anything linked to conversion (using Meta block) or typo error in the field name.
The real question...Companion issue in Preship site is here -
https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/649
This is not a case of anything linked to conversion (using Meta block) or typo error in the field name.
The real question is - how to troubleshoot this kind of problem?
cc @nthakur and @gehrmannhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/195Unhandled Exceptions for missing required attributes while creating record2024-03-15T14:13:45ZAnubhav BajajUnhandled Exceptions for missing required attributes while creating recordIssue Currently, the storage PUT endpoint lacks proper error messages if there are missing attributes of in the payload. It shows a generic message which is not informative enough for user to address it: "HV000028: Unexpected exception d...Issue Currently, the storage PUT endpoint lacks proper error messages if there are missing attributes of in the payload. It shows a generic message which is not informative enough for user to address it: "HV000028: Unexpected exception during isValid call,"
Ideally the error message should clearly list out the missing attributes such as 'kind', 'acl', or 'legal'.
Below sample example where acl in null , Response gives generic message.
![image](/uploads/738760ea17a8bce24fc4615d5d26920b/image.png)
Suggestions
• Add cases where these required attributes are null. With relevant error messages like
| Missing Attributes | Suggested Error Messages |
|--------------------|--------------------------|
| Kind | Mandatory fields missing- kind / kind cannot be empty |
| Acl | Mandatory fields missing- acl / acl cannot be empty |
| Legal | Mandatory fields missing- legal / legal cannot be empty |
| Acl and Legal | Mandatory fields missing- acl, Mandatory fields missing- legal / acl cannot be empty, legal cannot be empty |
| Kind, Acl and Legal | Mandatory fields missing- kind, Mandatory fields missing- acl, Mandatory fields missing- legal / kind cannot be empty, acl cannot be empty, legal cannot be empty |https://community.opengroup.org/osdu/platform/system/storage/-/issues/211Different behavior on delete endpoint2024-03-15T14:13:21ZAdam ChengDifferent behavior on delete endpointThere are currently two endpoints for deleting an object>
1. Deleting multiple objects: /records/delete
2. Deleting single object /records/<Object_id>:delete
When attempt to delete an object that is already been deleted:
Endpoint 1 will...There are currently two endpoints for deleting an object>
1. Deleting multiple objects: /records/delete
2. Deleting single object /records/<Object_id>:delete
When attempt to delete an object that is already been deleted:
Endpoint 1 will return 204 while endpoint 2 will return 404
It would be desirable if endpoint 1 returns some sort of error if one of/multiple objects has already been deleted or non-existencehttps://community.opengroup.org/osdu/platform/system/storage/-/issues/212GeoJson validation2024-03-15T14:11:20ZAdam ChengGeoJson validationThis is a linked issue between the Storage API and Search API.
When I ingest an new object witha invalid GeoJSON (e.g. polygon is is not close). It will pass the Storage API as it mainly check types. But it will silently failed indexing...This is a linked issue between the Storage API and Search API.
When I ingest an new object witha invalid GeoJSON (e.g. polygon is is not close). It will pass the Storage API as it mainly check types. But it will silently failed indexing and never show up on Search API.
A related issue: currently it take up to 30 seconds before a newly ingested object shows up on the Search API. It makes a bit challenging for a near real-time application.
Possible solution:
An additional query param on the PUT `/records` endpoint. If the param is set, the operation will only be successful when it finished indexing.
It would be ideal for ingestion and indexing/discovery operations to be atomichttps://community.opengroup.org/osdu/platform/system/storage/-/issues/213Discrepancy in Storage API for create/update record operation2024-01-31T23:00:31ZNeha KhandelwalDiscrepancy in Storage API for create/update record operationFor Storage create/update record API, if a record ID ends in a dot (.) the data block for the record is not properly uploaded to the Microsoft storage account. In cases, where the records for create/update multiple record operation have ...For Storage create/update record API, if a record ID ends in a dot (.) the data block for the record is not properly uploaded to the Microsoft storage account. In cases, where the records for create/update multiple record operation have the similar IDs only differentiated by a dot at the end (ex. M/M.), the data block will be the same for both records. The issue is that Microsoft storage accounts do not support directory names ending with a dot (.), a forward slash (/), or a backslash (\\) and path segments ending with a dot ([https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata](https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata "https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata")). When uploading the block blob with the record data to the storage container, the BlobStore.class tries to use a path with the record ID as a folder, such as
\<kind\>/\<partition\>:reference-data--ExternalUnitOfMeasure:LIS-LAS::**M.**/1704916580557751
but \<partition\>:reference-data--ExternalUnitOfMeasure:LIS-LAS::**M.** is not a valid directory name so the dot at the end is ignored, and block blob is uploaded to \<partition\>:reference-data--ExternalUnitOfMeasure:LIS-LAS::**M** instead. This was also manually confirmed by trying to upload a blob to a folder with a name ending in dot.
It is a corner case but this issue has impacted RDD values for M and M. on all partitions
* For example on "prod-weu-des-prod-testing-eu", the records prod-weu-des-prod-testing-eu:reference-data--ExternalUnitOfMeasure:LIS-LAS::M. and prod-weu-des-prod-testing-eu:reference-data--ExternalUnitOfMeasure:LIS-LAS::M has same "M." values in ID, Code and Symbol field. These were created using RDD script/pipeline.
* Impact is on all partitions for \<partition\>:reference-data--ExternalUnitOfMeasure:LIS-LAS::M. and \<partition\>:reference-data--ExternalUnitOfMeasure:LIS-LAS::M values
Proposed solution is to reject record IDs that end in these unsupported characters (i.e. return 400 bad request when such record IDs are used).https://community.opengroup.org/osdu/platform/system/storage/-/issues/214CRS Conversion not working if persistableReferenceCrs left off --> Move Stora...2024-03-08T00:30:17ZBryan DawsonCRS Conversion not working if persistableReferenceCrs left off --> Move Storage to crs-conversion v3According to the documentation we are supposed to be able to leave off the `persistableReferenceCrs` now (although, because of it being required in the schema the best we can do is make it an empty string) and that the CRS conversions wi...According to the documentation we are supposed to be able to leave off the `persistableReferenceCrs` now (although, because of it being required in the schema the best we can do is make it an empty string) and that the CRS conversions will use the `CoordinateReferenceSystemID` to look up the value of the persistable reference. However, this does not appear to work.
Take the following simple Well for example:
```json
{
"acl": {
"owners": [
"data.default.owners@dp.myosdu.com"
],
"viewers": [
"data.default.viewers@dp.myosdu.com"
]
},
"data": {
"FacilityName": "Dummy 1 - Do Not Use Me",
"SpatialLocation": {
"AsIngestedCoordinates": {
"CoordinateReferenceSystemID": "dp:reference-data--CoordinateReferenceSystem:Geographic2D:EPSG::4326:",
"features": [
{
"geometry": {
"coordinates": [
-45.944904,
18.12565
],
"type": "AnyCrsPoint"
},
"properties": {},
"type": "AnyCrsFeature"
}
],
"persistableReferenceCrs": "",
"type": "AnyCrsFeatureCollection"
},
"SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:"
}
},
"id": "dp:master-data--Well:TEST_CRS_METHODS",
"kind": "osdu:wks:master-data--Well:1.3.0",
"legal": {
"legaltags": [
"dp-default-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"meta": [ ]
}
```
Which should populate the Wgs84Coordinates when sent to the indexer, but you do not see it in the indexer record:
```json
{
"acl": {
"owners": [
"data.default.owners@dp.myosdu.com"
],
"viewers": [
"data.default.viewers@dp.myosdu.com"
]
},
"authority": "osdu",
"createTime": "2024-01-23T19:16:46.001Z",
"createUser": "bryan.j.dawson@exxonmobil.com",
"data": {
"FacilityName": "Dummy 1 - Do Not Use Me",
"SpatialLocation.SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:",
"VirtualProperties.DefaultLocation.SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:",
"VirtualProperties.DefaultName": "Dummy 1 - Do Not Use Me"
},
"id": "dp:master-data--Well:TEST_CRS_METHODS",
"kind": "osdu:wks:master-data--Well:1.3.0",
"legal": {
"legaltags": [
"dp-default-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"namespace": "osdu:wks",
"source": "wks",
"tags": {
"normalizedKind": "osdu:wks:master-data--Well:1"
},
"type": "master-data--Well",
"version": 1706037405365334
}
```
However, if we fill in the `persistableReferenceCrs` and meta block it works as expected.
```json
{
"acl": {
"owners": [
"data.default.owners@dp.myosdu.com"
],
"viewers": [
"data.default.viewers@dp.myosdu.com"
]
},
"data": {
"FacilityName": "Dummy 1 - Do Not Use Me",
"SpatialLocation": {
"AsIngestedCoordinates": {
"CoordinateReferenceSystemID": "dp:reference-data--CoordinateReferenceSystem:Geographic2D:EPSG::4326:",
"features": [
{
"geometry": {
"coordinates": [
-45.944904,
18.12565
],
"type": "AnyCrsPoint"
},
"properties": {},
"type": "AnyCrsFeature"
}
],
"persistableReferenceCrs": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"name\":\"GCS_WGS_1984\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\"}",
"type": "AnyCrsFeatureCollection"
},
"SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:"
}
},
"id": "dp:master-data--Well:TEST_CRS_METHODS",
"kind": "osdu:wks:master-data--Well:1.3.0",
"legal": {
"legaltags": [
"dp-default-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"meta": [
{
"coordinateReferenceSystemID": "dp:reference-data--CoordinateReferenceSystem:Geographic2D:EPSG::4326:",
"kind": "CRS",
"name": "WGS 84",
"persistableReference": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"4326\"},\"name\":\"GCS_WGS_1984\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433],AUTHORITY[\\\"EPSG\\\",4326]]\"}",
"propertyNames": [
"SpatialLocation.AsIngestedCoordinates"
]
}
]
}
```
and indexer record looks like what we expected
```json
{
"acl": {
"owners": [
"data.default.owners@dp.myosdu.com"
],
"viewers": [
"data.default.viewers@dp.myosdu.com"
]
},
"authority": "osdu",
"createTime": "2024-01-23T19:16:46.001Z",
"createUser": "bryan.j.dawson@exxonmobil.com",
"data": {
"FacilityName": "Dummy 1 - Do Not Use Me",
"SpatialLocation.SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:",
"SpatialLocation.Wgs84Coordinates": {
"geometries": [
{
"coordinates": [
-45.944904,
18.12565
],
"type": "point"
}
],
"type": "geometrycollection"
},
"VirtualProperties.DefaultLocation.IsDecimated": false,
"VirtualProperties.DefaultLocation.SpatialGeometryTypeID": "dp:reference-data--SpatialGeometryType:Point:",
"VirtualProperties.DefaultLocation.Wgs84Coordinates": {
"geometries": [
{
"coordinates": [
-45.944904,
18.12565
],
"type": "point"
}
],
"type": "geometrycollection"
},
"VirtualProperties.DefaultName": "Dummy 1 - Do Not Use Me"
},
"id": "dp:master-data--Well:TEST_CRS_METHODS",
"kind": "osdu:wks:master-data--Well:1.3.0",
"legal": {
"legaltags": [
"dp-default-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"modifyTime": "2024-01-23T19:33:06.905Z",
"modifyUser": "bryan.j.dawson@exxonmobil.com",
"namespace": "osdu:wks",
"source": "wks",
"tags": {
"normalizedKind": "osdu:wks:master-data--Well:1"
},
"type": "master-data--Well",
"version": 1706038386690196
}
```
:pushpin: **Update from @nthakur :**
> `CoordinateReferenceSystemID` dynamic lookup feature was implemented on crs-conversion `v3` endpoint.
> Storage service conversion endpoint (used by indexer-service) calls crs-conversion service to get converted records. As of current release (M22), Storage service is using `v2` endpoints from crs-conversion. This is the reason we see this behavior.
So to solve this one will require moving to the v3 endpoints for crs-conversion service.https://community.opengroup.org/osdu/platform/system/storage/-/issues/215Increase timeout for storage service requests2024-02-01T12:46:24ZSudesh TagadpallewarIncrease timeout for storage service requestsWhen registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java....When registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java.net.SocketTimeoutException: Read timed out**) when it tries to upsertRecord in the Storage.
We have found out that when dataset service is calling storage service and it is taking more than 5 seconds which results in a SocketTimeoutException.
When creating `StorageService` instance using `StorageFactory`, new `HttpClient()` instance is used which has default timeout of 5 seconds. Instead of using new `HttpClient` instance `HttpClientHandler` instance should have been used which has 60 seconds timeout. This code is present in the core-common library. See attached image for reference. ![storage](/uploads/5d81a52c9a968975ad40a538088a57dc/storage.JPG)https://community.opengroup.org/osdu/platform/system/storage/-/issues/216Storage PUT /records lost update2024-02-28T08:02:10ZMykyta SavchukStorage PUT /records lost updateThe issue occurs in storage service when trying to update the same record (with the same id) using multiple asynchronous requests at the same time. As a result, only one version is saved in the database and the others are lost.
For exam...The issue occurs in storage service when trying to update the same record (with the same id) using multiple asynchronous requests at the same time. As a result, only one version is saved in the database and the others are lost.
For example, suppose we call the storage PUT API with three asynchronous requests for the same record. Even though the storage returns 201 with version for each of the requests, calling /records/{id}/{version} with the three created versions results in two 404s and only one 200. All three versions are saved in the blob storage, but "gcsVersionPaths" array of the record in the database has only one new version.
Looking at the code, it appears that this is a lost update problem. When updating a record, the storage fetches the record from the database, performs certain manipulations on it, and then saves it in the database. So when multiple threads are running at the same time, they simultaneously fetch the same record (with the same "gcsVersionPaths" array), add a new version to the array, and save the record in the database. And each thread overwrites the newly added version by the previous thread, resulting in only one version being saved by the last thread executed.
Possible solution: Implement optimistic locking for PUT API. To implement optimistic locking, we can add an additional field to the database record that is updated together with the record. So we fetch the record along with this field and when saving it, we check whether the value of the field has changed, if so, we abort the changes.
I'm assuming all provider databases have this functionality built in. For example, in Azure CosmosDB, every item stored in the database has a system-defined property "_etag", and to enable optimistic locking we can pass parameters when saving the record.https://community.opengroup.org/osdu/platform/system/storage/-/issues/217ADR: Delete record versions2024-03-26T09:49:14ZNeelesh ThakurADR: Delete record versions<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Problem Statement
Storage service allows creation of record versions without any restrictions. Once number of reco...<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Problem Statement
Storage service allows creation of record versions without any restrictions. Once number of record versions goes beyond certain limit (e.g. 1K for Azure), it's very costly to fetch the record. Please refer to [ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/178) for more details.
[ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/178) implementation restricts the maximum number of versions to avoid any accidently cost spikes.
This ADR proposes a solution to delete record versions.
[Back to TOC](#TOC)
# Proposed solution
Storage API should provide a new endpoint to delete record versions. It will permanently delete record versions and operation cannot be undone. Users must be member of 'users.datalake.admins' role and OWNER of the record
<details>
<summary>API specification</summary>
```yaml
"/records/{id}/versions":
delete:
tags:
- Records
summary: Purge record versions
description: "The API performs the permanent physical deletion of the given record versions excluding latest version and any linked records or files if there are any.
If 'from' query parameter is used then it will delete all versions before current one (exclusive). If 'limit' query parameter is used then it will delete oldest versions defined by 'limit'.
If both 'from' and 'limit' are used then API will delete 'limit' number of versions starting 'from' version. Instead of using 'limit' or 'from', list of versions can be provided on 'versionIds' query parameter.
API will delete all versions defined by 'versionIds' query parameter. Maximum 50 record versions can be deleted per request.
This operation cannot be undone. Required roles: 'users.datalake.admins' who is the OWNER of the record."
operationId: Purge record versions
parameters:
- name: id
in: path
description: Valid record id following "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+$" pattern
required: true
schema:
type: string
- name: from
in: query
description: Record version id from which all record versions aside from the current one are deleted
required: false
schema:
type: long
- name: limit
in: query
description: Number of oldest record versions to be deleted. Value must not exceed number of record versions (excluding latest version)
required: false
schema:
type: integer
- name: versionIds
in: query
description: Comma separated version list (excluding latest version) to be deleted. Maximum 50 versions can be deleted per request.
required: false
schema:
type: integer
- $ref: "#/components/parameters/data-partition-id"
responses:
"204":
description: Record deleted successfully.
"400":
description: Validation error.
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"403":
description: User not authorized to perform the action.
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"404":
description: Record not found.
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
"500":
description: Unknown Error.
content:
application/json:
schema:
$ref: "#/components/schemas/AppError"
security:
- bearer: []
```
</details>
[Back to TOC](#TOC)
# Consequences
- New API added to Storage service
- New endpoint is available on Storage service's swagger page
- Tutorial is updated with new endpoint
[Back to TOC](#TOC)M23 - Release 0.26Neelesh ThakurNeelesh Thakurhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/218ADR: Option to retain source systems audit info and override audit fields dur...2024-03-26T15:01:14ZRasheed Nagoor GaniADR: Option to retain source systems audit info and override audit fields during migration[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
When a record is created, the 'createUser'/'modifyUser' field automatically captures the username from the token and sets ...[[_TOC_]]
# Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
# Background
When a record is created, the 'createUser'/'modifyUser' field automatically captures the username from the token and sets the 'createTime'/'modifyTime' to the current timestamp. These fields play a crucial role in providing audit information to identify who created or modified and when. While this mechanism works seamlessly for new records created through the OSDU APIs, it may lead to confusion when dealing with migrated data.
The source systems maintain their own set of audit fields, which should be preserved in their original state during migration. Preserving this audit trail is vital to upholding data integrity and regulatory compliance.
Refer Aha ticket [IDEA-I-130](https://osdu-community.ideas.aha.io/ideas/IDEA-I-130)
# Context & Scope
The audit information captured in 'createUser', 'createTime', 'modifyUser' and 'modifyTime' fields can be stored in extendedProperties. However, the limitations of extendedProperties, such as the inability to index values, hinder the efficient filtering and retrieval of records.
To address this issue, either source system’s audit information such as createUser, creatTime, modifyUser and modifyTime should be set in new attribute, or the storage service should allow to override existing attribute values.
# Proposed solution
Option 1: Introduce an 'Audit' object attribute into the common schema, integrating it as a standard attribute of all data type schemas. This approach ensures consistent and comprehensive auditing capabilities across different data types.
Option 2: Implement a new user or role with specialized permissions to override audit attributes, including the createUser, createDate, modifyUser and modifyTime fields. This designated user or role is specifically designated for managing data migration processes. For instance, when initiating the ingestion API using this designated user, the Platform verifies its migration status. In such instances, the user's email and creation time will be sourced from Manifest values rather than the token or current timestamp.
# Consequences
Option 1: The implementation of Option 1 may entail a time-consuming process and could potentially have a significant impact on existing records. Integrating the 'Audit' object attribute into the common schema may require thorough planning and careful consideration to mitigate disruptions to the system.
Option 2: While Option 2 eliminates the need for introducing new attributes, it necessitates modifications to the Storage Service logic. Adapting the system to accommodate a new user or role with override permissions may require adjustments to the existing logic and workflows within the Storage Service.https://community.opengroup.org/osdu/platform/system/storage/-/issues/219Records created with special characters are not discoverable2024-03-15T13:22:20ZAbhishek Kumar (SLB)Records created with special characters are not discoverableStorage service allows user to a create record with encoded special character.
However, if we try to get the created record storage service return 404.
**Actual ID:** winter-aker-bp-super-sprint-5:reference-data--UnitOfMeasure:m/h
<br>
...Storage service allows user to a create record with encoded special character.
However, if we try to get the created record storage service return 404.
**Actual ID:** winter-aker-bp-super-sprint-5:reference-data--UnitOfMeasure:m/h
<br>
**Encoded ID**: winter-aker-bp-super-sprint-5:reference-data--UnitOfMeasure:m%2fh
The Storage POST endpoint allows user to create storage records with encoded ids:
![image](/uploads/6a923c3582dcb993eaf8d84e2ff32166/image.png)
But the problem arises when user tries to retrieve the record using get endpoint:
`{
"code": 400,
"reason": "Validation error.",
"message": "{\"errors\":[\"Not a valid record id. Found: winter-aker-bp-super-sprint-5:reference-data--UnitOfMeasure:m%2fh\"]}"
}`
The same record do appears in the search result:
![image](/uploads/566be3486c02de5956b2c96e10709d2e/image.png)Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/220storage record with no acl owners become ghost record if OPA service is enabled.2024-03-28T06:20:14ZOm Prakash Guptastorage record with no acl owners become ghost record if OPA service is enabled.Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group...Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group can access the record. However, it is possible to delete the group and even disassociate ACL groups from the storage record. there is no validation as of now for a must-required single record. eventually record becomes a ghost record and nobody can access it.
There was a fix provided to users. data. root members can still access the group and add ACLs if needed.
it is discussed in this ADR
https://community.opengroup.org/osdu/platform/security-and-compliance/entitlements/-/issues/141
# Findings
We have seen that code works fine and still users.data.root members can access the record if there is no associated ACL members for the record but if OPA is enabled we can not access the record even member is associated to users.data.root group.
code below checks if OPA is enabled and get access rights from OPA service
https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/service/IngestionServiceImpl.java#L198
OPA service returns with false access rites. However, if OPA is disabled the flow works because we have code added to return true if the member belongs to users.data.root.
We have found this not working in the Azure OSDU instance and need to know if requires a policy file fix or shall be handled in code to stop records from becoming ghost in case OPA is enabled.Dadong ZhouKelly ZhouShane HutchinsDeepa KumariDadong Zhouhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/222SLB Feature Request - Need capability to write policy based on data records p...2024-03-15T13:21:48ZDadong ZhouSLB Feature Request - Need capability to write policy based on data records propertiesFrom Fabrice HAÜY \[SLB\] on Slack:
Hi Team, I'm looking for some updated information / roadmap, as from our latest conversations at the OSDU F2F in London, I understood that currently, the policy engine only knowns about id, kind, lega...From Fabrice HAÜY \[SLB\] on Slack:
Hi Team, I'm looking for some updated information / roadmap, as from our latest conversations at the OSDU F2F in London, I understood that currently, the policy engine only knowns about id, kind, legal tag, and acl, making it not possible to create policy entitlements based on the value of a property of the record. I'm looking for information surrounding this limitation and when it'll be unlocked. thank you in advance
Copied from Policy repo: https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/issues/95
cc @chad @hutchins @KellyZhou