Storage issueshttps://community.opengroup.org/osdu/platform/system/storage/-/issues2022-05-05T23:57:47Zhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/26[System/Storage] Relax id validation to support OSDU relationship definitions...2022-05-05T23:57:47ZThomas Gehrmann [slb][System/Storage] Relax id validation to support OSDU relationship definitions/constraintsOSDU defines entity-types as a compound reference `<group-type>/<individual-type>`. These OSDU entity-type specifications are used to constrain relationships, e.g. identify a relationship target type via a pattern.
## Jump to [latest co...OSDU defines entity-types as a compound reference `<group-type>/<individual-type>`. These OSDU entity-type specifications are used to constrain relationships, e.g. identify a relationship target type via a pattern.
## Jump to [latest conclusion](#summary-january-26-2021)
The Storage service constrains the `id` using this regular expression in [ValidationDoc.java](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/storage/validation/ValidationDoc.java#L25):
* `"[\\w-\\.]+:[\\w-\\.]+:[\\w-\\.]+"` describing the following parts:
* `<data-partition-id>:<entity-type>:<unique-instance-id>` where entity-type means group-type/individual-type.
The corresponding JSON schema pattern regex using ECMAScript style is
* `"^[\\w-\\.]+:[\\w-\\.]+:[\\w-\\.]+$"`
it should be changed to at least
* `"^[\\w-\\.]+:[\\w-\\.\\/]+:[\\w-\\.]+$"` -- see revision below.
to support `<data-partition-id>:<group-type>/<individual-type>:<unique-instance-id>`.
Furthermore it should be decided which other characters to allow in the unqiue `<unique-instance-id>`. My suggestion is to relax this to support GUIDs (already supported) and url-encoded strings. There are a number of use cases for deterministic `<unique-instance-id>` for reference data.
# Decision as per November 3rd
The regex expression for id will change to:
* `"^[\\w-\\.]+:[\\w-\\.\\/]+:.+$"`
The actual validation regex must be published with the Storage service. In turn, OSDU data definitions must adopt the constraints in their schema definitions. At the moment validation pattern for `id` are entirely unconstrained, except `:`, i.e. `[^:\]+` for each of the `id` parts.
# Addition December 6th:
The regex for the kind in [ValidationDoc.java line 27](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/storage/validation/ValidationDoc.java#L27) seems to be incorrect as well. It lacks the `^` and `$` symbols at the beginning and end (otherwise any invalid characters can be added at the beginning and end). The condition for the semantic version number also doesn't filter invalid separators. Instead this expression should work:
```regex
^[\w\-\.]+:[\w\-\.]+:[\w\-\.]+:[0-9]+.[0-9]+.[0-9]+$
or as string:
"^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.]+:[0-9]+.[0-9]+.[0-9]+$"
```
# Summary January 6, 2021
The following regex expressions have been tested in https://regex101.com/ using the ECMAScript option (JSON standard):
```regex
RECORD_ID_REGEX = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+$"
as used in regex101: ^[\w\-\.]+:[\w-\.\/]+:.+$
RECORD_ID_WITH_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+:[0-9]+$"
as used in regex101: ^[\w\-\.]+:[\w-\.\/]+:.+:[0-9]+$
KIND_REGEX = "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.\\/]+:[0-9]+.[0-9]+.[0-9]+$"
as used in regex101 ^[\w\-\.]+:[\w\-\.]+:[\w\-\.\/]+:[0-9]+.[0-9]+.[0-9]+$
```
If we eventually support 'optionally versioned' id references in the Storage API, there is another regex required:
```regex
RECORD_ID_WITH_OPTIONAL_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.\\/]+:.+:[0-9]*$"
as used in regex101 ^[\w\-\.]+:[\w-\.\/]+:.+:[0-9]*$
```
It turned out that all these 'wishes' were made without (seriously) checking the implementations. `/` is a reserved character in at least one implementation. Therefore, change of plans, again.
# Summary January 26, 2021
To preserve 'business' `id`s, like unit symbols, it is required to url-encode the desired IDs, e.g. in reference-data. This stops the otherwise reserve characters. `:` is already used as a separator in `kind` and `id`. It is a desired symbol for certain business desired `id`s. This means the last part of the `id` should use this regex: `[\w\-\.\:\%]+` alpha-numeric characters, underscore, dash, dot, colon and percent.
```regex
RECORD_ID_REGEX = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+$"
as used in regex101: ^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+$
RECORD_ID_WITH_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]+$"
as used in regex101: ^[\w\-\.]+:[\w-\.\/]+:[\w\-\.\:\%]+:[0-9]+$
KIND_REGEX = "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.]+:[0-9]+.[0-9]+.[0-9]+$"
as used in regex101 ^[\w\-\.]+:[\w\-\.]+:[\w\-\.]+:[0-9]+.[0-9]+.[0-9]+$
```
If we eventually support 'optionally versioned' id references in the Storage API, there is another regex required:
```regex
RECORD_ID_WITH_OPTIONAL_VERSION_REGEX = "^[\\w\\-\\.]+:[\\w-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]*$"
as used in regex101 ^[\w\-\.]+:[\w-\.]+:[\w\-\.\:\%]+:[0-9]*$
```M1 - Release 0.1ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/55With R3 Schemas, Storage Frame of Reference is (generally) broken2024-01-27T06:33:10ZGary MurphyWith R3 Schemas, Storage Frame of Reference is (generally) broken**Background**
In the R2 timeframe, SLB contributed a feature for Storage that allowed (optional) fetching of records that also performed conversion to a standard Frame of Reference (FoR) in terms of CRS (WGS84), Unit, and datetime (UT...**Background**
In the R2 timeframe, SLB contributed a feature for Storage that allowed (optional) fetching of records that also performed conversion to a standard Frame of Reference (FoR) in terms of CRS (WGS84), Unit, and datetime (UTC). <br>
_Here is the relevant Gitlab issue_: [Storage Issue #9](https://community.opengroup.org/osdu/platform/system/storage/-/issues/9)
The salient point is that in this FoR implementation, the markup that determined what frame of reference the source data was in resided in the "metadata" block of the storage record. <br>
**Bug**<br>
With the new R3 Schemas, the metadata for "AnyCRS" feature collections has moved to the JSON block that directly contains the coordinate points, thus rendering the metadata block obsolete for CRS conversions and **breaking** the Storage FoR service. Records input in a different schema that uses the metadata block to markup the attributes with their source CRS, Unit, and datetime "units' will still work, but all R3 Coordinates using AbstractAnyCrsFeatureCollection will not work with the Storage FoR and will be indexed in their source values making them more or less meaningless.M8 - Release 0.11Anuj GuptaAnuj Guptahttps://community.opengroup.org/osdu/platform/system/storage/-/issues/34InMemroy cache of schema resulting in un-expected behavior2020-12-15T13:53:56ZKishore BattulaInMemroy cache of schema resulting in un-expected behaviorSchemas are cached in memory of the running application. Because of this inmemory caching the following scenario will fail. Let us assume we have 2 instances of application running I1 and I2.
1. Create schema - Lands on I1
2. Get schema ...Schemas are cached in memory of the running application. Because of this inmemory caching the following scenario will fail. Let us assume we have 2 instances of application running I1 and I2.
1. Create schema - Lands on I1
2. Get schema - Lands on I1. This schema is cached on I1
3. Delete schema - Lands on I2. This will try to delete any cache entry for that schema in I2. So far the cache entry on I1 is still intact.
4. Get schema - Lands on I1. As the cache entry is not cleared it will return 200 instead of 404 not found.
The below test along with other tests are failing intermittently because of the above mentioned issue.
- should_createSchema_and_returnHttp409IfTryToCreateItAgain_and_getSchema_and_deleteSchema_when_providingValidSchemaInfo
If the service is running on multiple VMs or Pods, this issue will be prominent and will block pipelines and results in un-expected behaviorhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/188Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declaration2024-01-18T16:01:20ZThomas Gehrmann [slb]Normalizer: meta[].unitOfMeasureID shouldbe preferred unit declarationReported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in...Reported by Marcus Ridgway:
The UoM Meta[] schema supports association of a Unit of Measure to one or more attributes in a JSON record. The core of the UoM schema is the _unitOfMeasureID_ attribute which associates attributes defined in _propertyNames_ to the ID of the UOM in the Unit of Measure Reference list e.g. for a Wellbore record
```json
{
"kind": "Unit",
"name": "ft",
"persistableReference": "",
"propertyNames": [
"FacilitySpecifications[0].FacilitySpecificationQuantity",
"VerticalMeasurements[0].VerticalMeasurement"
],
"unitOfMeasureID": "osdu:reference-data--UnitOfMeasure:ft:"
}
```
The persistableReference attribute in meta[] is there to support storage of the full UoM Definition when unitOfMeasureID is not populated. E.g. for metres:
"persistableReference": "{\"abcd\":{\"a\":0.0,\"b\":1.0,\"c\":1.0,\"d\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"L\",\"type\":\"UM\"},\"type\":\"UAD\"}",
Populating persistableReference is no longer required if the UnitOfMeasure Reference List is now fully populated i.e. IDs exist for all used UoMs. This removes any need to populate persistableReference. Regardless, populating persistableReference is extremely onerous for a number of reasons:
- does not adhere to one version of the truth - UoM need only be defined in the UoM Reference List; storing UoM definition in persistableReference in all records is the most extreme opposite
- all ETLs would be required to populate all the meta[] UoM definitions for all record types - the UoM definition is maintained in every ETL
- all OSDU records unnecessarily bloated by carrying all this redundant, duplicate persistableReference metadata within Meta[] in each and every record when it is centrally stored in the Reference List. This impacts storage requirements for OSDU records.
Problem: The Normalizer for the Search API for numeric values does not support API > SI Search when JSON records do not have persistableReference populated. The only data needing to be populated is unitOfMeasureID, but this is ignored by the Normalizer and instead requires persistableReference to be populated.
Require: Normalized to be extended to support the unitOfMeasureID populated in Meta. When populated, any content, including blank content is ignored, the Normalizer instead retrieves the persistableReference content from the UnitOfMeasure Reference List (source of truth for UoM definitions).
---
Comment from @gehrmann - means the normalizer needs to be enhanced. From the schema side of things we have said that if `unitOfMeasureID` is populated it should supersede the `persistableReference` which is the future goal. The [AbstractMetaItem](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractMetaItem.1.0.0.json?ref_type=heads#L58) schema is historical and requires the `persistableReference` to be set. It should however be sufficient to set `"persistableReference": ""` when populating `unitOfMeasureID`.
Originally reported as [schema issue 624](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/issues/624)M22 - Release 0.25https://community.opengroup.org/osdu/platform/system/storage/-/issues/149ADR: Namespacing storage records2024-03-19T02:18:17Zashley kelhamADR: Namespacing storage records# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/...# Background
The OSDU is agreeing on a new EA level ADR for 'collaborations'. This is a wide ranging and broad problem that is trying to be solved. You can see info at the EA level [here](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/48).
At its heart is the idea that data must be separated between the system of record and system of engagement. Today the OSDU only supports the system of record. All data therefore by default resides in the system of record and the APIs we use read, write and delete from the system of record.
In this ADR we are looking at how we can separate data in Storage service into separate namespaces. These namespaces can in the future be linked to a specific collaboration, which will form the system of engagement.
The system of engagement is meant to be interacted with by any application wanting to add/update data into the OSDU. Therefore we should have some understanding of what application is making the requests into the system of engagement.
We are starting with storage service as all other changes needed for the system of engagement data separation will be driven by this change.
![image](/uploads/b269adeef9f11aa773480f96a4b7c7d7/image.png)
As shown, the system of engagement can have many namespaces, one for each collaboration.
A single storage record can reside in any number of namespaces. A namespace can also have 0 or many Records.
A storage record consists of 2 parts, the metadata and the data.
```
{
id: "opendes:mastered-wellbore:12345678",
kind: "osdu:wks:mastered-wellbore:1.0.0",
...
...
data: {
...
...
}
}
```
Everything inside the 'data' json object shown above is classed as the data and everything else is the 'metadata'.
These are stored separately by the storage service in a 1-many relationship. Every time a Records data is updated it creates a new version of that data that points to a single metadata instance.
The reference is held directly in the metadata. We can think of the referencing of the data blocks to the metadata like this
Diagram 1
![image](/uploads/ecdb68f32ab861835cca78533ed0716f/image.png)
The latest data version referenced is the 'head' and is returned by default when no version is specified when using the Storage APIs.
If I retrieve an older version of the 'data' I am only ever returned the same version of the metadata.
With collaboration there is the possibility that many 'heads' exist at the same time, one per collaboration. There can be many collaborations and each collaboration can hold many entities.
Each collaboration should be treated independently. therefore any change to a Record in the context of a collaboration should be reflected only in that context and not affect any others.
# Out of scope
For this ADR we are looking only at how we separate data in Storage service between the System of Record (what exists today in OSDU) and System of engagement (collaborations).
We are **not** deciding on
- How DDMS will separate the data
- How Consumption services like search separate the data
- How data will transfer between the system of Record and system of engagement in Storage
- How collaborations will act on this or control this behavior or even what a collaboration entity looks like
- Any other service that might need to act on a collaboration context e.g. ingestion
# Solution
The suggestion is to create a different instance of the Storage metadata specific to the collaboration context. It is stored using a compound key of the record id + the collaboration id.
This collaboration id forms the namespace for a record, and combining the 2 means we have a unique metadata instance per collaboration.
Therefore if a Record is not assigned to a collaboration the namespace is the same as it is today (empty) and the id remains unchanged. This maintains current system behavior for existing data in the system of record.
>Note: The Record ID is never changed between namespaces and should be persisted and returned to the user the same as it is today no matter the context provided. The id of the document/row used in the database should **append** the namespace value so that multiple metadata instances can coexist for the same Record ID. This means the data model of the metadata needs to have a separate record id and row/document id value.
References to the data are held in each metadata allowing the same data to be referenced by multiple namespaces but also to have unique versions of a record Id to exist in individual namespaces. The reference is also quick and cheap to add/remove from different namespaces.
Diagram 2
![image](/uploads/6df9c0249d22cf3cbdd34e3d9b1f096f/image.png)
>Note that multiple collaborations could be active at the same time and the 'data' versions does not have to be linear between them. For example changes from different collaborations could overlap one another. This is because the version is already defined as an epoch timestamp and so is versioned based on when it was created.
Diagram 3
![image](/uploads/d69b9d0fd9ffdfe6af3913c35bdc7b84/image.png)
### Behavior of retrieval APIs
If we take diagram 3 as the current state of a Record we can look at how different API requests to it should be handled with and without a collaboration context.
#### Getting latest in collaboration 1
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>' \
--header 'x-collaboration: id=collaboration 1,application=<app-name>;' \
-- data-raw
```
Expected Result: V7 returned
#### Retrieving version 4 when no collaboration provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
#### Retrieving version 4 when collaboration 2 provided
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<id>/versions/<version4>' \
--header 'x-collaboration: id=collaboration 2,application=<app-name>;' \
-- data-raw
```
Expected Result: Error, version 4 does not exist
## Collaboration context header
The **x-collaboration** is an optional HTTP header that holds directives in requests instructing the Storage service to handle in context of the provided collaboration instance and not in the context of the system of record. We are designing it using directives so that is is more extensible overtime to incorporate other elements potentially needed by the collaboration feature set.
**NB: In the fullness of time many services will be impacted by the collaboration EA requirements. They could/should re-use this same header to support acting on a specific collaboration context for consistency and usability.**
### Syntax
Collaboration directives follow the validation rules below:
- Directives are case-insensitive but lowercase is recommended
- Multiple directives are comma-separated
### Request Directives
| Request | Description |
| ----------- | ----------- |
| id | Mandatory. The ID of the collaboration to handle the request against. |
| application | Mandatory. The name of the application sending the request. |
### Examples
#### Retrieve a specific version of a Record that exists in a collaboration
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--header 'x-collaboration: id=<collaboration-id>,application=<app-name>;' \
--data-raw '
```
#### Retrieve a specific version of a Record that exists the system of record
We do not send a collaboration context here as it wants to access data from the system of record. This is the same request the user should be doing today.
```
curl -X 'GET' \
'<osdu>/api/storage/v2/records/<record-id>/versions/<version>' \
-header 'data-partition-id: opendes' \
--header 'authorization: Bearer <JWT>' \
--header 'Content-Type: application/json' \
--data-raw '
```
Note the given record id and version of the record must exist in both the system of record and the collaboration id for both API requests to return successfully.
### Record changed on namespace
To guarantee that the current system behavior is not changed we will create a new record changed topic that is triggered only when A record is edited in some way in context to a collaboration.
This means the existing record changed topic remains unchanged and is triggered only when changes are made in the system of record like they are today.
The new Record changed on namespace topic can then be bound to by downstream listeners over timer as and when they want to support the namespace concept.
The new message will also include the extra context information about the namespace. The message will be the same as the current record change message except it will include the new header
```
'''
x-collaboration: id=<id>,application=<app-name>;
'''
...
```
On top of this the new topic should be exposed through the Notification service so it can be registered to by external consumers as needed.
# Consequences
The storage service should support a new 'collaboration' header. Anytime a collaboration id is provided in this header the storage service should act only in that context. This should mean all storage APIs need to act specific to the collaboration context given, for creation, update, retrieval and deletion of records.
If no header is provided the Storage service should function the same as it does today and no change in behavior should be observed.
In the shared code section we will generate a new 'collaboration context' class that is passed into the CSP specific data layer. This property will have the collaboration id and application name. Each CSP should use this combined with the record id for the primary key of the metadata's data model. In this way the collaboration id forms the namespace of the record id so multiple metadata's can exist simultaneously.
We need a new 'Record changed collaboration' message and have it exposed through notification service
The hard delete API needs to validate all contexts before deleting the blob as multiple contexts could be referencing the same blob instanceM15 - Release 0.18ashley kelhamashley kelhamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/48Cursors Should be Encoded On Server Side2022-08-23T15:55:58ZJasonCursors Should be Encoded On Server SideProblem: Currently, the integration tests do not enforce the requirement that the cursors returned by the `query/kinds` and `query/records` endpoints are encoded on the server side. The integration tests are not assuming the returned cur...Problem: Currently, the integration tests do not enforce the requirement that the cursors returned by the `query/kinds` and `query/records` endpoints are encoded on the server side. The integration tests are not assuming the returned cursors are encoded; they are manually encoding the cursors ([example](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/testing/storage-test-core/src/main/java/org/opengroup/osdu/storage/query/StorageQuerySuccessfulTest.java#L62)). I think we only expect clients to be using cursors to pass back into a subsequent request to these endpoints. Therefore, it makes sense to require encoding on the server-side so that the cursors returned to clients are ready to use.
This change would require the CSPs verifying that their code is encoding the cursor on the server side and then updating the integration tests so that they assume the cursor they get in the response body is encoded already.https://community.opengroup.org/osdu/platform/system/storage/-/issues/36Storage API behavior and the documentation discrepancies observed while testing2021-06-16T22:19:33ZKamlesh TodaiStorage API behavior and the documentation discrepancies observed while testing[StorageAPIDiscrepancies.txt](/uploads/ccb2af067085eb6ccd61d3c2d22f1715/StorageAPIDiscrepancies.txt)
Storage API discrepancies
Reference:
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_...[StorageAPIDiscrepancies.txt](/uploads/ccb2af067085eb6ccd61d3c2d22f1715/StorageAPIDiscrepancies.txt)
Storage API discrepancies
Reference:
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
1) Storage - Get kinds using invalid cursor value OR Get all records for a kind with invalid cursor (Storage collection request #02b, #08iii)
GET: https://{{STORAGE_HOST}}/query/kinds?cursor=invalid_cursor&limit=10
or
GET: https://{{STORAGE_HOST}}/query/records?cursor=invalid_cursor&limit=10&kind={{data-partition-id}}:osdu:well-master:{{enriched_schema_version}}
The response Body, I get on all platforms is
{
"code": 400,
"reason": "Cursor invalid",
"message": "The requested cursor does not exist or is invalid"
}
The Return status Code is 400
But the status description that I get on Azure is null
where else on AWS, GCP and IBM I get "Bad Request"
The storage API documentation mentions that
For query /kind one should get 200(All kinds retrieved successfully.) or 500 (Unknown Error) and there is no mention of 400
For query/records one should get 200(Record Ids retrieved successfully.) or 404 Records or cursor not found and again no mention of 400
2) Storage - Create or update records with Invalid acl group (Storage collection request #13)
PUT: https://{{STORAGE_HOST}}/records?skipdupes=true
The response Body, I get on all platforms is
{
"code": 400,
"reason": "Validation error.",
"message": "createOrUpdateRecords.records[0].acl: Invalid group name '{{New_OwnerDataGroupInvalid}}@opendes.lk'"
}
The return code is 400 on all platforms
But return status description on Azure is null and other platforms is "Bas Request"
The storage API documentation mentions that
For PUT records one should get status return code of (
201 - Records created and/or updated successfully.
400 - Invalid record format.
403 - User not authorized to perform the action.
404 - Invalid acl group.
400 is mentioned but the description does not match
3) Storage - Create or update records without correct permission to access the API (Storage collection request #14)
PUT: https://{{STORAGE_HOST}}/records?skipdupes=true
The response Body I get on AWS and Azure
{
"code": 403,
"reason": "Access denied",
"message": "The user is not authorized to perform this action"
}
The response Body I get on GCP and IBM
{
"code": 401,
"reason": "Access denied",
"message": "The user is not authorized to perform this action"
}
The response on AWS and Azure is matching the Storage API documentation, but the code returned does not match on GCP and IBM
So to resolve or report these kinds of discrepancies, what is the best of way of doing it. Any input will be apprecatied.
Thanks.
KamleshDania Kodeih (Microsoft)Wladmir FrazaoJoeDmitriy RudkoDania Kodeih (Microsoft)https://community.opengroup.org/osdu/platform/system/storage/-/issues/35[Storage] Huge slowdown in Storage Stability and Transaction Size2022-08-23T13:29:41ZGary Murphy[Storage] Huge slowdown in Storage Stability and Transaction SizeThe Storage service Record creation API in the latest version on Azure (post-AKS refactoring, deployed to client test environment but tested below in-house at SLB in dev) has two significant problems:
1. Overall loading speed seems muc...The Storage service Record creation API in the latest version on Azure (post-AKS refactoring, deployed to client test environment but tested below in-house at SLB in dev) has two significant problems:
1. Overall loading speed seems much slower than the original R2+ contributed version on GCP. Using identical Python loading code, the difference (larger sizes extrapolated) is about (per Thomas Dombrowsky):
Tentative timings on Azure OSDU EVQ and DELFI P4D
OSDU Azure (10,000 records) __7 min 40 sec__ (1M records) __12h 46min 40 sec__
DELFI DM (10,000 records) __30 sec__ (1M records) __50 min__
2. The number of records that can be reliably included as the payload per call has gotten very small compared to the GCP version. Per Thomas D. again:
The real killer is that the Azure Storage API fails with large payload, so the number of records that can be ingested per API call are low. The failure is not because of a hard limit. With one record payloads, you get a random failure in 1% of calls (record is ingested just fine on the next attempt). With 20 record payloads, there are more failed requests than successful ones. With 100 records per payload, it takes dozens of attempts to get a single payload through.
By contrast, on DELFI GCP we do 100 record payloads and it succeeds every time.M1 - Release 0.1https://community.opengroup.org/osdu/platform/system/storage/-/issues/27[System/Storage] Allow additional (optional) root properties2022-08-19T12:41:51ZThomas Gehrmann [slb][System/Storage] Allow additional (optional) root propertiesThe Storage service is not allowing additional root properties. The list currently supported according to the [class Record](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/ope...The Storage service is not allowing additional root properties. The list currently supported according to the [class Record](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/storage/Record.java#L43):
* `id`, string (see also [validation issue](https://community.opengroup.org/osdu/platform/system/storage/-/issues/26)), required
* `version`, long integer/int64, optional, system-generated
* `kind`, string, validation pattern `"^[A-Za-z0-9-_]+:[A-Za-z0-9-_]+:[A-Za-z0-9-_]+:[0-9]+.[0-9]+.[0-9]+$"` (in [Storage ](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/storage/validation/ValidationDoc.java#L27)it is declared `"[\\w-\\.]+:[\\w-\\.]+:[\\w-\\.]+:(\\d+.)?(\\d+.)?(\\d+)"`)
* `acl`, structure of type `org.opengroup.osdu.core.common.model.entitlements.Acl`, required
* `legal`, structure of type `org.opengroup.osdu.core.common.model.legal.Legal`, required
* `data`, dictionary, required non-empty
* `ancestry`, structure of type [`RecordAncestry`](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/storage/RecordAncestry.java#L24), optional
* `meta`, array of objects, optional
As a consequence of [](https://community.opengroup.org/osdu/documentation/-/wikis/Resolutions-for-GroupType,-WorkProduct-and-Kind-identity-discussions) and friction resolution the following OSDU standard properties must be added - the root properties are currently defined in [AbstractResource](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Templates/_TemplatesUsedByScripts/AbstractResource.json) by data definition to be included in all OSDU schemas:
> * **_Dropped:_** `groupType`, string, optional for non-OSDU schemas, mandatory for OSDU schemas.
* `resourceHomeRegionID`, string, reference id to a `reference-data/OSDURegion` type instance, optional
* `resourceHostRegionIDs`, array of strings, reference ids to a `reference-data/OSDURegion` type instance, optional
* `resourceObjectCreationDateTime`, datetime, should be set by the system, optional
* `resourceVersionCreationDateTime`, datetime string, should be set by the system, optional
* `resourceCurationStatus`, string, reference id to a `reference-data/ResourceCurationStatus` type instance, optional
* `resourceLifecycleStatus`, string, reference id to a `reference-data/ResourceLifecycleStatus` type instance, optional
* `resourceSecurityClassification`, string, reference id to a `reference-data/ResourceSecurityClassification` type instance, optional
* `source`, string reference id to a `master-data/Organisation` type instance, optional
* `existenceKind`, string reference id to a `reference-data/ExistenceKind` type instance, optional
* `licenseState`, string reference id to a `reference-data/LicenseState` type instance, optional
* `resourceObjectCreatedBy`, to reflect for the name of the person responsible for the creation of this Object, optional
* `resourceVersionCreatedBy`, to reflect for the name of the person responsible for the creation of this Object, optional
(Style changes approved by Data Definitions Core Concepts August 11.)
References:
* https://community.opengroup.org/osdu/documentation/-/issues/13
* https://community.opengroup.org/osdu/documentation/-/wikis/Resolutions-for-GroupType,-WorkProduct-and-Kind-identity-discussionsM1 - Release 0.1https://community.opengroup.org/osdu/platform/system/storage/-/issues/220storage record with no acl owners become ghost record if OPA service is enabled.2024-03-28T06:20:14ZOm Prakash Guptastorage record with no acl owners become ghost record if OPA service is enabled.Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group...Storage records become inaccessible if OPA is enabled in case there is no ACL group associated with the record.
# Scenario:
Usually, when we create a record we define the owners and viewers group and the member associated with the group can access the record. However, it is possible to delete the group and even disassociate ACL groups from the storage record. there is no validation as of now for a must-required single record. eventually record becomes a ghost record and nobody can access it.
There was a fix provided to users. data. root members can still access the group and add ACLs if needed.
it is discussed in this ADR
https://community.opengroup.org/osdu/platform/security-and-compliance/entitlements/-/issues/141
# Findings
We have seen that code works fine and still users.data.root members can access the record if there is no associated ACL members for the record but if OPA is enabled we can not access the record even member is associated to users.data.root group.
code below checks if OPA is enabled and get access rights from OPA service
https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/storage-core/src/main/java/org/opengroup/osdu/storage/service/IngestionServiceImpl.java#L198
OPA service returns with false access rites. However, if OPA is disabled the flow works because we have code added to return true if the member belongs to users.data.root.
We have found this not working in the Azure OSDU instance and need to know if requires a policy file fix or shall be handled in code to stop records from becoming ghost in case OPA is enabled.Dadong ZhouKelly ZhouShane HutchinsDeepa KumariDadong Zhouhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/215Increase timeout for storage service requests2024-02-01T12:46:24ZSudesh TagadpallewarIncrease timeout for storage service requestsWhen registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java....When registering dataset using `/registerDataset` some users are getting 400 error. As per the Logs this request is timing out(with the error- **Unexpected error sending to URL http://storage/api/storage/v2/records METHOD PUT error java.net.SocketTimeoutException: Read timed out**) when it tries to upsertRecord in the Storage.
We have found out that when dataset service is calling storage service and it is taking more than 5 seconds which results in a SocketTimeoutException.
When creating `StorageService` instance using `StorageFactory`, new `HttpClient()` instance is used which has default timeout of 5 seconds. Instead of using new `HttpClient` instance `HttpClientHandler` instance should have been used which has 60 seconds timeout. This code is present in the core-common library. See attached image for reference. ![storage](/uploads/5d81a52c9a968975ad40a538088a57dc/storage.JPG)https://community.opengroup.org/osdu/platform/system/storage/-/issues/191Add /liveness_check2024-01-08T10:07:15ZRiabokon Stanislav(EPAM)[GCP]Add /liveness_checkNeed to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.Need to add the endpoint '/liveness_check' to verify the operational status of the Storage Service.M23 - Release 0.26Riabokon Stanislav(EPAM)[GCP]Riabokon Stanislav(EPAM)[GCP]https://community.opengroup.org/osdu/platform/system/storage/-/issues/184Storage Record query does not include record audit info2023-12-06T15:31:26ZAn NgoStorage Record query does not include record audit infoStorage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.Storage query/records API returns records without audit information such as createdUser, createTime, modifyUser, modifyTime.
This behavior is inconsistent with other Storage record query such as the batch fetch and the record fetch APIs.https://community.opengroup.org/osdu/platform/system/storage/-/issues/182Issues observed with logging2023-12-01T06:47:32ZLarissa PereiraIssues observed with logging**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading fr...**Issue 1: Duplicate operation IDs**
We observed multiple dependency logs for disparate operations (based on record ids) with identical operation Id's for the POST QueryApi/getRecords API. Duplicate entries were observed when reading from BlobStore for operation READ_FROM_STORAGE_CONTAINER although these logs belonged to separate operations.
![image](/uploads/afc539574de597bba300b5d6b2a18b8a/image.png)
**Issue 2: Multiple dependency logs and missing Read log**
We observed multiple dependency logs with identical operation Id's for the POST QueryApi/fetchRecords. These entries were observed when querying CosmosStore, however the READ_FROM_STORAGE_CONTAINER dependency log is missing.
![image](/uploads/ce377f8bf6ee95646ca1ab5d910df167/image.png)M22 - Release 0.25VidyaDharani LokamVidyaDharani Lokamhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/175Storage service triggered more than 1 time while ingesting 1 single record.2023-06-09T01:27:02ZBruce JinStorage service triggered more than 1 time while ingesting 1 single record.Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within...Currently when running manifest ingestion by reference, one single record will trigger more than 1 `PUT` call to the `storage service`. This is due to this API will return a `201 CREATED` if it works, which is not an `OK` response within file `common-python-sdk/osdu_api/utils/request.py`. We need to include more acceptable status codes to avoid time wasting.https://community.opengroup.org/osdu/platform/system/storage/-/issues/174Data authorization issue for Update/Patch operation2024-01-29T19:22:30ZDadong ZhouData authorization issue for Update/Patch operationWhen the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header inf...When the Storage service sends data authorization requests for Update/Patch operation to the Policy service, only the new data record header info (ACLs and LegalTags) are sent to the Policy service and the existing data record header info are not included in the request. So the user will be able to update/patch a data record (based on the new ACLS/LegalTags) when the user should have no permission to update/patch (based on the existing record ACLS/LegalTags).
cc @hmarkovic @chad @hutchins @MonicaJohnsM22 - Release 0.25Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/165Need example of how to use the POST /query/records:batch Fetch multiple rec...2023-03-09T21:25:28ZKamlesh TodaiNeed example of how to use the POST /query/records:batch Fetch multiple recordsThe Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU ...The Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU account or customer's account) which the users choose to use with the Search API.
frame-of-reference: This value indicates whether normalization applies, should be either 'none' or 'units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;'
@chad @debasiscM17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/164For AWS platform query to get all kinds is not returning any records.2023-03-09T21:26:03ZKamlesh TodaiFor AWS platform query to get all kinds is not returning any records.The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json'...The query to retrieve all the kinds is not returning any results (records)
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/kinds' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOiJ...7kPscDabFJ3sEPeNA'
The response 200 OK (with results being empty)
{
"results": []
}
The collection used can be found at https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
The request name is "01 Storage - Get all kinds success scenario"
@chad @debasiscM16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/158AZURE: on reading version from storage we are checking only viewer permissions2023-01-16T12:02:28ZYauheni LesnikauAZURE: on reading version from storage we are checking only viewer permissionsOn reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.On reading version from storage we are checking only viewer permissions. It would be nice to check both: viewer and owner ones.Yauheni LesnikauYauheni Lesnikauhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/157Storage Improperly local cached ORDC information from Legal service2023-05-30T08:55:39ZKelly ZhouStorage Improperly local cached ORDC information from Legal serviceCurrently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partit...Currently Storage cached the first time result of valid ORDC from legal service regardless of which data partition user is trying to ingest record into, which could be wrong as we do support whitelisting countries for certain data partitions.
In order to fix that, we need to have data partition id information in the local cache for ORDC information.M16 - Release 0.19