Storage issueshttps://community.opengroup.org/osdu/platform/system/storage/-/issues2024-03-28T06:20:14Zhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/220storage record with no acl owners become ghost record if OPA service is enabled.2024-03-28T06:20:14ZOm Prakash Guptastorage record with no acl owners become ghost record if OPA service is enabled.Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group...Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group can access the record. However, it is possible to delete the group and even disassociate ACL groups from the storage record. there is no validation as of now for a must-required single record. eventually record becomes a ghost record and nobody can access it.
There was a fix provided to users. data. root members can still access the group and add ACLs if needed.
it is discussed in this ADR
https://community.opengroup.org/osdu/platform/security-and-compliance/entitlements/-/issues/141
# Findings
We have seen that code works fine and still users.data.root members can access the record if there is no associated ACL members for the record but if OPA is enabled we can not access the record even member is associated to users.data.root group.
code below checks if OPA is enabled and get access rights from OPA service
https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/service/IngestionServiceImpl.java#L198
OPA service returns with false access rites. However, if OPA is disabled the flow works because we have code added to return true if the member belongs to users.data.root.
We have found this not working in the Azure OSDU instance and need to know if requires a policy file fix or shall be handled in code to stop records from becoming ghost in case OPA is enabled.Dadong ZhouKelly ZhouShane HutchinsDeepa KumariDadong Zhouhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/215Increase timeout for storage service requests2024-02-01T12:46:24ZSudesh TagadpallewarIncrease timeout for storage service requestsWhen registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java....When registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java.net.SocketTimeoutException: Read timed out**) when it tries to upsertRecord in the Storage.
We have found out that when dataset service is calling storage service and it is taking more than 5 seconds which results in a SocketTimeoutException.
When creating `StorageService` instance using `StorageFactory`, new `HttpClient()` instance is used which has default timeout of 5 seconds. Instead of using new `HttpClient` instance `HttpClientHandler` instance should have been used which has 60 seconds timeout. This code is present in the core-common library. See attached image for reference. ![storage](/uploads/5d81a52c9a968975ad40a538088a57dc/storage.JPG)https://community.opengroup.org/osdu/platform/system/storage/-/issues/191Add /liveness_check2024-01-08T10:07:15ZRiabokon Stanislav(EPAM)[GCP]Add /liveness_checkNeed to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.Need to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.M23 - Release 0.26Riabokon Stanislav(EPAM)[GCP]Riabokon Stanislav(EPAM)[GCP]https://community.opengroup.org/osdu/platform/system/storage/-/issues/188Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declaration2024-01-18T16:01:20ZThomas Gehrmann [slb]Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declarationReported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in...Reported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in _propertyNames_ to the ID of the UOM in the Unit of Measure Reference list e.g. for a Wellbore record
```json
{
"kind": "Unit",
"name": "ft",
"persistableReference": "",
"propertyNames": [
"FacilitySpecifications[0].FacilitySpecificationQuantity",
"VerticalMeasurements[0].VerticalMeasurement"
],
"unitOfMeasureID": "osdu:reference-data--UnitOfMeasure:ft:"
}
```
The persistableReference attribute in meta[] is there to support storage of the full UoM Definition when unitOfMeasureID is not populated. E.g. for metres:
"persistableReference": "{\"abcd\":{\"a\":0.0,\"b\":1.0,\"c\":1.0,\"d\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"L\",\"type\":\"UM\"},\"type\":\"UAD\"}",
Populating persistableReference is no longer required if the UnitOfMeasure Reference List is now fully populated i.e. IDs exist for all used UoMs. This removes any need to populate persistableReference. Regardless, populating persistableReference is extremely onerous for a number of reasons:
- does not adhere to one version of the truth - UoM need only be defined in the UoM Reference List; storing UoM definition in persistableReference in all records is the most extreme opposite
- all ETLs would be required to populate all the meta[] UoM definitions for all record types - the UoM definition is maintained in every ETL
- all OSDU records unnecessarily bloated by carrying all this redundant, duplicate persistableReference metadata within Meta[] in each and every record when it is centrally stored in the Reference List. This impacts storage requirements for OSDU records.
Problem: The Normalizer for the Search API for numeric values does not support API > SI Search when JSON records do not have persistableReference populated. The only data needing to be populated is unitOfMeasureID, but this is ignored by the Normalizer and instead requires persistableReference to be populated.
Require: Normalized to be extended to support the unitOfMeasureID populated in Meta. When populated, any content, including blank content is ignored, the Normalizer instead retrieves the persistableReference content from the UnitOfMeasure Reference List (source of truth for UoM definitions).
---
Comment from @gehrmann - means the normalizer needs to be enhanced. From the schema side of things we have said that if `unitOfMeasureID` is populated it should supersede the `persistableReference` which is the future goal. The [AbstractMetaItem](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractMetaItem.1.0.0.json?ref_type=heads#L58) schema is historical and requires the `persistableReference` to be set. It should however be sufficient to set `"persistableReference": ""` when populating `unitOfMeasureID`.
Originally reported as [schema issue 624](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/issues/624)M22 - Release 0.25https://community.opengroup.org/osdu/platform/system/storage/-/issues/184Storage Record query does not include record audit info2023-12-06T15:31:26ZAn NgoStorage Record query does not include record audit infoStorage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.Storage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.https://community.opengroup.org/osdu/platform/system/storage/-/issues/182Issues observed with logging2023-12-01T06:47:32ZLarissa PereiraIssues observed with logging**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading fr...**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading from BlobStore for operation READ_FROM_STORAGE_CONTAINER although these logs belonged to separate operations.
![image](/uploads/afc539574de597bba300b5d6b2a18b8a/image.png)
**Issue 2: Multiple dependency logs and missing Read log**
We observed multiple dependency logs with identical operation Id's for the POST QueryApi/fetchRecords. These entries were observed when querying CosmosStore, however the READ_FROM_STORAGE_CONTAINER dependency log is missing.
![image](/uploads/ce377f8bf6ee95646ca1ab5d910df167/image.png)M22 - Release 0.25VidyaDharani LokamVidyaDharani Lokamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/175Storage service triggered more than 1 time while ingesting 1 single record.2023-06-09T01:27:02ZBruce JinStorage service triggered more than 1 time while ingesting 1 single record.Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within...Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within file `common-python-sdk/osdu_api/utils/request.py`. We need to include more acceptable status codes to avoid time wasting.https://community.opengroup.org/osdu/platform/system/storage/-/issues/174Data authorization issue for Update/Patch operation2024-01-29T19:22:30ZDadong ZhouData authorization issue for Update/Patch operationWhen the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header inf...When the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header info are not included in the request. So the user will be able to update/patch a data record (based on the new ACLS/LegalTags) when the user should have no permission to update/patch (based on the existing record ACLS/LegalTags).
cc @hmarkovic @chad @hutchins @MonicaJohnsM22 - Release 0.25Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/165Need example of how to use the POST /query/records:batch Fetch multiple rec...2023-03-09T21:25:28ZKamlesh TodaiNeed example of how to use the POST /query/records:batch Fetch multiple recordsThe Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU ...The Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU account or customer's account) which the users choose to use with the Search API.
frame-of-reference: This value indicates whether normalization applies, should be either 'none' or 'units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;'
@chad @debasiscM17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/164For AWS platform query to get all kinds is not returning any records.2023-03-09T21:26:03ZKamlesh TodaiFor AWS platform query to get all kinds is not returning any records.The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json'...The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOiJ...7kPscDabFJ3sEPeNA'
The response 200 OK (with results being empty)
{
"results": []
}
The collection used can be found at https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
The request name is "01 Storage - Get all kinds success scenario"
@chad @debasiscM16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/158AZURE: on reading version from storage we are checking only viewer permissions2023-01-16T12:02:28ZYauheni LesnikauAZURE: on reading version from storage we are checking only viewer permissionsOn reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.On reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/157Storage Improperly local cached ORDC information from Legal service2023-05-30T08:55:39ZKelly ZhouStorage Improperly local cached ORDC information from Legal serviceCurrently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partit...Currently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partitions.
In order to fix that, we need to have data partition id information in the local cache for ORDC information.M16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/155GCP failing with core-common v0.18.0-rc42023-01-02T11:18:05ZMina OtgonboldGCP failing with core-common v0.18.0-rc4osdu-gcp-anthos-test integration tests are consistently failing when the core-common version is upgraded to v0.18.0-rc4.
Currently, gcp consumes 0.17.0 version of core-common which contains vulnerable libraries. The storage MR "Update ...osdu-gcp-anthos-test integration tests are consistently failing when the core-common version is upgraded to v0.18.0-rc4.
Currently, gcp consumes 0.17.0 version of core-common which contains vulnerable libraries. The storage MR "Update Storage to be Collaboration Context Aware" needs to consume a new version of core-common that exposes collaboration context. It is a blocker for this storage MR to be merged. As a quick fix for gcp test failure, we created a core-common that has collaboration context off of 0.17.0 version of core-common. The pipeline is passing with this version, which indicates that the gcp test failure is coming from the core-common version upgrade from 0.17.0 to 0.18.0-rc4.
References
* [Associated storage MR](https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/546)
* [Core-common MR](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/merge_requests/183)
* [ADR for the storage and core-common MRs](https://community.opengroup.org/osdu/platform/system/storage/-/issues/149)Yauhen Shaliou [EPAM/GCP]Yauhen Shaliou [EPAM/GCP]https://community.opengroup.org/osdu/platform/system/storage/-/issues/153Indexer fetch records requests should not be checked via OPA/Policy (Or any o...2023-03-06T10:20:12ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comIndexer fetch records requests should not be checked via OPA/Policy (Or any other service, that sends internal requests)**Problem:**
Currently, the Storage service will evaluate policies for service requests of the Indexer service, which doesn't make sense since the indexer should be able to fetch any record ingested to the platform.
Indexer fetch reque...**Problem:**
Currently, the Storage service will evaluate policies for service requests of the Indexer service, which doesn't make sense since the indexer should be able to fetch any record ingested to the platform.
Indexer fetch requests use common requests authentication flow when OPA integration is enabled:
https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/opa/service/OPAServiceImpl.java#L104
~~~
http://localhost:8181/v1/data/osdu/partition/osdu/dataauthz/records
{
"input": {
"operation": "view",
"token": "indexer-service-token",
"datapartitionid": "osdu",
"records": [{
"id": "osdu:master-data--Well:999907686759",
"kind": "osdu:wks:master-data--Well:1.0.0",
"legal": {
"legaltags": ["osdu-demo-legaltag"],
"otherRelevantDataCountries": ["US"],
"status": "compliant"
},
"acls": {
"viewers": ["data.default.viewers@osdu.osdu-gcp.go3-nrg.projects.epam.com"],
"owners": ["data.default.owners@osdu.osdu-gcp.go3-nrg.projects.epam.com"]
}
}
]
}
}
~~~
And it is possible that Indexer will not be authorized to fetch records:
~~~
HttpResponse(headers = {
null = [HTTP / 1.1 200 OK],
Content - Length = [305],
Date = [Tue, 29 Nov 2022 10: 58: 31 GMT],
Content - Type = [application / json]
}, body = {
"result": [{
"errors": [{
"code": 401,
"id": "osdu:master-data--Well:999907686759",
"message": "Legal response 401 {\"code\":401,\"reason\":\"Unauthorized\",\"message\":\"The user is not authorized to perform this action\"}",
"reason": "Error from compliance service"
}
],
"id": "osdu:master-data--Well:999907686759"
}
]
}, contentType = application / json, responseCode = 200, exception = null, request = http: //localhost:8181/v1/data/osdu/partition/osdu/dataauthz/records, httpMethod=POST, latency=812)
~~~
And will receive an empty response:
~~~
{
"records": [],
"notFound": [
"osdu:master-data--Well:999907686759"
],
"conversionStatuses": []
}
~~~
Which left records not indexed, and not searchable. Scenarios, when this occurrence happens, look quite easy to achieve, for example when the record uses ACLs that don't belong to the Service token.
**Solution:**
We need to bypass OPA\Policy authentication for internal service requests.M16 - Release 0.19Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRiabokon Stanislav(EPAM)[GCP]Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/152Upgrade azure-storage SDK2022-11-28T14:39:21ZNur SheikhUpgrade azure-storage SDKIn storage service we are using the azure-storage sdk 8.6.5 from com.microsoft.azure package which is too old and not having much support. It iis advisable to use the latest sdk for com.azure package.In storage service we are using the azure-storage sdk 8.6.5 from com.microsoft.azure package which is too old and not having much support. It iis advisable to use the latest sdk for com.azure package.https://community.opengroup.org/osdu/platform/system/storage/-/issues/151Storage service fails due to opa enabled value being true.2022-12-20T04:08:07ZNikhil Singh[MicroSoft]Storage service fails due to opa enabled value being true.2022-11-16 10:51:53.832 ERROR storage-6446654dcd-5m7cm --- [-nio-80-exec-52] o.o.o.a.l.Slf4JLogger correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9 data-partition-id=opendes api-method=PUT operation-name={PUT [/reco...2022-11-16 10:51:53.832 ERROR storage-6446654dcd-5m7cm --- [-nio-80-exec-52] o.o.o.a.l.Slf4JLogger correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9 data-partition-id=opendes api-method=PUT operation-name={PUT [/records], consumes [application/json], produces [application/json]} user-id=8b2a56ba-edf5-47ce-94b6-42c336ec8172 app-id=678fadf8-e5a8-46cd-a75d-4d6cc95d9bc9:storage.app error getting data authorization result {correlation-id=fd7c531b-f76c-4467-a502-8860097b79a9, data-partition-id=opendes} org.opengroup.osdu.core.common.model.http.AppException: error getting data authorization result| at org.opengroup.osdu.storage.opa.service.OPAServiceImpl.evaluateDataAuthorizationPolicy(OPAServiceImpl.java:125) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.opa.service.OPAServiceImpl.validateUserAccessToRecords(OPAServiceImpl.java:86) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]|
at org.opengroup.osdu.storage.service.IngestionServiceImpl.validateUserAccessAndCompliancePolicyConstraints(IngestionServiceImpl.java:415) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.service.IngestionServiceImpl.getRecordsForProcessing(IngestionServiceImpl.java:176) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.service.IngestionServiceImpl.createUpdateRecords(IngestionServiceImpl.java:98) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.provider.azure.service.IngestionServiceAzureImpl.createUpdateRecords(IngestionServiceAzureImpl.java:27) ~[classes!/:?]| at org.opengroup.osdu.storage.api.RecordApi.createOrUpdateRecords(RecordApi.java:80) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.api.RecordApi$$FastClassBySpringCGLIB$$495e8f0c.invoke(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 11 lines| at org.opengroup.osdu.storage.api.RecordApi$$EnhancerBySpringCGLIB$$a32ffde7.createOrUpdateRecords(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| at org.opengroup.osdu.storage.api.RecordApi$$FastClassBySpringCGLIB$$495e8f0c.invoke(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 9 lines| at org.opengroup.osdu.storage.api.RecordApi$$EnhancerBySpringCGLIB$$1ec1cefc.createOrUpdateRecords(<generated>) ~[storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 2 lines| at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_332]| ... suppressed 18 lines| at org.opengroup.osdu.storage.util.StorageFilter.doFilter(StorageFilter.java:86) [storage-core-0.15.1-SNAPSHOT.jar!/:?]| ... suppressed 2 lines| at org.opengroup.osdu.azure.filters.TransactionLogFilter.doFilter(TransactionLogFilter.java:74) [core-lib-azure-0.17.0-rc14.jar!/:?]| ... suppressed 34 lines| at org.opengroup.osdu.azure.filters.Slf4jMDCFilter.doFilter(Slf4jMDCFilter.java:69) [core-lib-azure-0.17.0-rc14.jar!/:?]| ... suppressed 18 lines| at com.microsoft.applicationinsights.web.internal.WebRequestTrackingFilter.doFilter(WebRequestTrackingFilter.java:142) [applicationinsights-web-2.6.4.jar!/:?]| ... suppressed 18 lines|
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]|Nikhil Singh[MicroSoft]Nikhil Singh[MicroSoft]https://community.opengroup.org/osdu/platform/system/storage/-/issues/149ADR: Namespacing storage records2024-03-19T02:18:17Zashley kelhamADR: Namespacing storage records# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/...# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/48).
At its heart is the idea that data must be separated between the system of record and system of engagement. Today the OSDU only supports the system of record. All data therefore by default resides in the system of record and the APIs we use read, write and delete from the system of record.
In this ADR we are looking at how we can separate data in Storage service into separate namespaces. These namespaces can in the future be linked to a specific collaboration, which will form the system of engagement.
The system of engagement is meant to be interacted with by any application wanting to add/update data into the OSDU. Therefore we should have some understanding of what application is making the requests into the system of engagement.
We are starting with storage service as all other changes needed for the system of engagement data separation will be driven by this change.
![image](/uploads/b269adeef9f11aa773480f96a4b7c7d7/image.png)
As shown, the system of engagement can have many namespaces, one for each collaboration.
A single storage record can reside in any number of namespaces. A namespace can also have 0 or many Records.
A storage record consists of 2 parts, the metadata and the data.
```
{
id: "opendes:mastered-wellbore:12345678",
kind: "osdu:wks:mastered-wellbore:1.0.0",
...
...
data: {
...
...
}
}
```
Everything inside the 'data' json object shown above is classed as the data and everything else is the 'metadata'.
These are stored separately by the storage service in a 1-many relationship. Every time a Records data is updated it creates a new version of that data that points to a single metadata instance.
The reference is held directly in the metadata. We can think of the referencing of the data blocks to the metadata like this
Diagram 1
![image](/uploads/ecdb68f32ab861835cca78533ed0716f/image.png)
The latest data version referenced is the 'head' and is returned by default when no version is specified when using the Storage APIs.
If I retrieve an older version of the 'data' I am only ever returned the same version of the metadata.
With collaboration there is the possibility that many 'heads' exist at the same time, one per collaboration. There can be many collaborations and each collaboration can hold many entities.
Each collaboration should be treated independently. therefore any change to a Record in the context of a collaboration should be reflected only in that context and not affect any others.
# Out of scope
For this ADR we are looking only at how we separate data in Storage service between the System of Record (what exists today in OSDU) and System of engagement (collaborations).
We are **not** deciding on
- How DDMS will separate the data
- How Consumption services like search separate the data
- How data will transfer between the system of Record and system of engagement in Storage
- How collaborations will act on this or control this behavior or even what a collaboration entity looks like
- Any other service that might need to act on a collaboration context e.g. ingestion
# Solution
The suggestion is to create a different instance of the Storage metadata specific to the collaboration context. It is stored using a compound key of the record id + the collaboration id.
This collaboration id forms the namespace for a record, and combining the 2 means we have a unique metadata instance per collaboration.
Therefore if a Record is not assigned to a collaboration the namespace is the same as it is today (empty) and the id remains unchanged. This maintains current system behavior for existing data in the system of record.
>Note: The Record ID is never changed between namespaces and should be persisted and returned to the user the same as it is today no matter the context provided. The id of the document/row used in the database should **append** the namespace value so that multiple metadata instances can coexist for the same Record ID. This means the data model of the metadata needs to have a separate record id and row/document id value.
References to the data are held in each metadata allowing the same data to be referenced by multiple namespaces but also to have unique versions of a record Id to exist in individual namespaces. The reference is also quick and cheap to add/remove from different namespaces.
Diagram 2
![image](/uploads/6df9c0249d22cf3cbdd34e3d9b1f096f/image.png)
>Note that multiple collaborations could be active at the same time and the 'data' versions does not have to be linear between them. For example changes from different collaborations could overlap one another. This is because the version is already defined as an epoch timestamp and so is versioned based on when it was created.
Diagram 3
![image](/uploads/d69b9d0fd9ffdfe6af3913c35bdc7b84/image.png)
### Behavior of retrieval APIs
If we take diagram 3 as the current state of a Record we can look at how different API requests to it should be handled with and without a collaboration context.
#### Getting latest in collaboration 1
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>' \
--header 'x-collaboration: id=collaboration 1,application=<app-name>;' \
-- data-raw
```
Expected Result: V7 returned
#### Retrieving version 4 when no collaboration provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
#### Retrieving version 4 when collaboration 2 provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
--header 'x-collaboration: id=collaboration 2,application=<app-name>;' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
## Collaboration context header
The **x-collaboration** is an optional HTTP header that holds directives in requests instructing the Storage service to handle in context of the provided collaboration instance and not in the context of the system of record. We are designing it using directives so that is is more extensible overtime to incorporate other elements potentially needed by the collaboration feature set.
**NB: In the fullness of time many services will be impacted by the collaboration EA requirements. They could/should re-use this same header to support acting on a specific collaboration context for consistency and usability.**
### Syntax
Collaboration directives follow the validation rules below:
- Directives are case-insensitive but lowercase is recommended
- Multiple directives are comma-separated
### Request Directives
| Request | Description |
| ----------- | ----------- |
| id | Mandatory. The ID of the collaboration to handle the request against. |
| application | Mandatory. The name of the application sending the request. |
### Examples
#### Retrieve a specific version of a Record that exists in a collaboration
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'x-collaboration: id=<collaboration-id>,application=<app-name>;' \
--data-raw '
```
#### Retrieve a specific version of a Record that exists the system of record
We do not send a collaboration context here as it wants to access data from the system of record. This is the same request the user should be doing today.
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--data-raw '
```
Note the given record id and version of the record must exist in both the system of record and the collaboration id for both API requests to return successfully.
### Record changed on namespace
To guarantee that the current system behavior is not changed we will create a new record changed topic that is triggered only when A record is edited in some way in context to a collaboration.
This means the existing record changed topic remains unchanged and is triggered only when changes are made in the system of record like they are today.
The new Record changed on namespace topic can then be bound to by downstream listeners over timer as and when they want to support the namespace concept.
The new message will also include the extra context information about the namespace. The message will be the same as the current record change message except it will include the new header
```
'''
x-collaboration: id=<id>,application=<app-name>;
'''
...
```
On top of this the new topic should be exposed through the Notification service so it can be registered to by external consumers as needed.
# Consequences
The storage service should support a new 'collaboration' header. Anytime a collaboration id is provided in this header the storage service should act only in that context. This should mean all storage APIs need to act specific to the collaboration context given, for creation, update, retrieval and deletion of records.
If no header is provided the Storage service should function the same as it does today and no change in behavior should be observed.
In the shared code section we will generate a new 'collaboration context' class that is passed into the CSP specific data layer. This property will have the collaboration id and application name. Each CSP should use this combined with the record id for the primary key of the metadata's data model. In this way the collaboration id forms the namespace of the record id so multiple metadata's can exist simultaneously.
We need a new 'Record changed collaboration' message and have it exposed through notification service
The hard delete API needs to validate all contexts before deleting the blob as multiple contexts could be referencing the same blob instanceM15 - Release 0.18ashley kelhamashley kelhamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/148ADR: Separate modifyTime and modifyUser for every version of OSDU storage record2023-07-05T09:49:05ZMandar KulkarniADR: Separate modifyTime and modifyUser for every version of OSDU storage recordSeparate modifyTime and modifyUser for every version of OSDU storage record
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The concept is that one record should have 1 versio...Separate modifyTime and modifyUser for every version of OSDU storage record
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The concept is that one record should have 1 version of metadata.
However, in regard to modifyUser and modifyTime attributes, they should be different for each version.
Currently, the behaviors are as implemented, but the behavior by the above concept is wrong.
The original issue that was raised is [here](https://community.opengroup.org/osdu/platform/system/storage/-/issues/126).
So with the current behavior, for multiple versions of the same record modifyTime and modifyUser value are same and they are overwritten to all versions during every modification made to the record.
Which means for records having only 1 version, it is like below.
|version1|
|:-------|
|createUser|
|createTime|
But when the record is modified and multiple versions are created, the metadata of the record for latest version is applied to all versions including the first version as well.
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
|modifyUser2|modifyUser2|modifyUser2|
|modifyTime2|modifyTime2|modifyTime2|
Due to this behavior, the record modification history is lost and which versions of the record are created by which users cannot be tracked.
## Tradeoff Analysis
The metadata; which contains modifyUser, modifyTime attributes; will be stored separately against every record version.
This means the metadata stored for storage records will increase.
The record modification history can be tracked and which users created different versions of the record can be traced, which was not possible before.
## Decision
Version 1 should only have createUser and createTime. modifyUser and modifyTime should not exist in the first version.
Version 2+ should have different modifyUser and modifyTime for each version.
|version1|version2 |version3|
|:-------|:--------|:--------|
|createUser| createUser| createUser|
|createTime| createTime| createTime|
| |modifyUser1|modifyUser2|
| |modifyTime1|modifyTime2|
If the record meta-data (i.e. tags, legal tags and ACLs blocks from the record) is modified using storage **PATCH** API, version number is not changed and only the **latest** value for modifyUser and modifyTime will be maintained against that record version.
## Consequences
- Storage service behavior will change.
- Storage service documentation needs to be updated.M17 - Release 0.20Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/147Current implementation doesn't delete all versions of the record with purging...2023-07-19T19:31:54ZAlok JoshiCurrent implementation doesn't delete all versions of the record with purging a recordRepro steps:
- create a record with the PUT API
- create another version of the same record with the PUT API
- hard delete (purge) the record
Expected: All metadata and storage blobs should be purged
Actual: Metadata gets purged, but o...Repro steps:
- create a record with the PUT API
- create another version of the same record with the PUT API
- hard delete (purge) the record
Expected: All metadata and storage blobs should be purged
Actual: Metadata gets purged, but only latest version gets purged. This leaves dangling references of other versions in Blob Storage
**Note**: The bug was observed for Azure implementation, but other providers should confirm the behavior and put in a fix if requiredM15 - Release 0.18Alok JoshiAlok Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/145Storage not consuming OSDU record id regex in RecordAncestry2023-05-30T08:56:49ZKelly ZhouStorage not consuming OSDU record id regex in RecordAncestryStorage does not consume OSDU record id regex properly when it comes to parent records, current method to get parent record id and version number will cause error, i.e. dp-id:test:parent::1234, new OSDU record id regex allow colon in pre...Storage does not consume OSDU record id regex properly when it comes to parent records, current method to get parent record id and version number will cause error, i.e. dp-id:test:parent::1234, new OSDU record id regex allow colon in previous section while Storage didn't respect that rule yet.
Changes need to be made in core common library to add validator for RecordAncestry which consumes OSDU record id regex properly add Storage needs to update the way how it gets parent record id and version number.M15 - Release 0.18