seismic-dms-service issueshttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues2024-03-19T14:59:18Zhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/130IBM E2E tests fail2024-03-19T14:59:18ZDaniel PerezIBM E2E tests failE2E tests for IBM in SDMS V3 are failing with no healthy upstream, this seems to be an issue with environment itself.E2E tests for IBM in SDMS V3 are failing with no healthy upstream, this seems to be an issue with environment itself.Anuj GuptaIsha KumariAnuj Guptahttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/129DATASET SELECT LS POST: while putting invalid characters in select it is givi...2024-02-29T12:14:56ZIsha KumariDATASET SELECT LS POST: while putting invalid characters in select it is giving response code 200. it should give 400 DATASET SELECT LS POST: while putting invalid characters in selectit is giving response code 200. it should give 400 DATASET SELECT LS POST: while putting invalid characters in selectit is giving response code 200. it should give 400https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/128Subproject creation accepts non-existing groups in ACLs2024-02-26T17:21:16ZYan Sushchynski (EPAM)Subproject creation accepts non-existing groups in ACLs## Description of the problem
There is an issue when it is possible to create a new subproject with non-existing groups in the `acls` field. And then, any action, except deleting the subproject, throws `403` in the subproject.
## Steps ...## Description of the problem
There is an issue when it is possible to create a new subproject with non-existing groups in the `acls` field. And then, any action, except deleting the subproject, throws `403` in the subproject.
## Steps to reproduce it
1. Create a new subproject with invalid acls:
```
curl --location --request POST 'https://<svc_url>/v3/subproject/tenant/osdu/subproject/test-123' \
--header 'x-api-key: {{SVC_API_KEY}}' \
--header 'Content-Type: application/json' \
--header 'ltag: osdu-demo-legaltag' \
--header 'appkey: {{DE_APP_KEY}}' \
--header 'Authorization: Bearer <token>' \
--data-raw '{
"storage_class": "REGIONAL",
"storage_location": "US-CENTRAL1",
"acls": {
"admins": [
"data.sdms.non-existing.admin@osdu.group"
],
"viewers": [
"data.sdms.non-existing.viewer@osdu.group"
]
}
}'
```
This request is executed without any error.
2. Try to upload any file to the subproject:
```shell
python sdutil cp somefile sd://osdu/test-123/somefile
```
Output:
```
[403] [seismic-store-service] User not authorized to perform this operation
```Diego MolteniSacha BrantsDiego Moltenihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/127Issue with Get Status API2024-02-09T18:12:56ZJiman KimIssue with Get Status APIHello we are running some authentication testing and are running into some behaviors that may or may not be a bug.
for this endpoint
/seistore-svc/api/v4/status
We have 3 tests running
1. Sends an invalid token
2. Sends a valid toke...Hello we are running some authentication testing and are running into some behaviors that may or may not be a bug.
for this endpoint
/seistore-svc/api/v4/status
We have 3 tests running
1. Sends an invalid token
2. Sends a valid token but signed with a wrong secret
3. Sends the HTTP request without an authorization header.
1,2 return a 401
but 3 returns 200.
Is this a bug or intended behavior?
Thank you!M21 - Release 0.24https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/125Patch dataset name issue2024-01-22T16:21:23ZYan Sushchynski (EPAM)Patch dataset name issueWe ran the [collection](https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/GC-M22/GC_OSDU_Smoke_Tests.postman_collection.json?ref_type=heads), and this request
```bash
curl --location --request PATCH 'https:/...We ran the [collection](https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/GC-M22/GC_OSDU_Smoke_Tests.postman_collection.json?ref_type=heads), and this request
```bash
curl --location --request PATCH 'https://<host>/api/seismic-store/v3/dataset/tenant/m19/subproject/subprojectodi374308/dataset/AutoTest_dsetodi831125?path=autotest_path' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: m19' \
--header 'Authorization: Bearer token' \
--data '{
"dataset_new_name": "autotest_new",
"metadata": {
"f1": "v1",
"f2": "v2",
"f3": "v3"
},
"filemetadata": {
"f1": "v1",
"f2": "v2",
"f3": "v3"
},
"last_modified_date": "Thu Jul 16 2020 04:37:41 GMT+0000 (Coordinated Universal Time)",
"gtags": [
"tag01",
"tag02",
"tag03"
],
"ltag": "m19-SeismicDMS-Legal-Tag-Test7649172",
"readonly": false,
"seismicmeta": {
"kind": "m19:seistore:seismic2d:1.0.0",
"legal": {
"legaltags": [
"m19-SeismicDMS-Legal-Tag-Test7649172"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"msg": "Auto Test sample data patched"
}
}
}'
```
And, we get the following error:
```bash
[seismic-store-service] The dataset sd://m19/subprojectodi374308/autotest_path/autotest_new already exists, even so there is no such a dataset in Seismic at the moment
```https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/123[IBM] replace keycloak-admin with @keycloak/keycloak-admin-client2024-03-05T01:08:14ZDiego Molteni[IBM] replace keycloak-admin with @keycloak/keycloak-admin-clientplease replace the deprecated and vulnerable package [keycloak-admin](https://www.npmjs.com/package/keycloak-admin) with the new [@keycloak/keycloak-admin-client](https://www.npmjs.com/package/@keycloak/keycloak-admin-client)please replace the deprecated and vulnerable package [keycloak-admin](https://www.npmjs.com/package/keycloak-admin) with the new [@keycloak/keycloak-admin-client](https://www.npmjs.com/package/@keycloak/keycloak-admin-client)M23 - Release 0.26Isha KumariIsha Kumarihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/122V4 API and Postman Collection showcasing the steps/sequence2024-01-11T17:24:56ZDebasis ChatterjeeV4 API and Postman Collection showcasing the steps/sequenceWe started to look at the collection provided by AWS
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/AWS-M22/DDMS%20Seismic/AWS_OSDUR3M22_Seismic_v4_Automated.postman_collection.json
This was apparently cre...We started to look at the collection provided by AWS
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M22/AWS-M22/DDMS%20Seismic/AWS_OSDUR3M22_Seismic_v4_Automated.postman_collection.json
This was apparently created with initial example from Dev team (Seismic DDMS).
We are a little unclear about the logical sequence and naming of the folder/requests.
Folder "Schema" is really to create some catalog record (Dataset FileCollection.SegY).
Folder "Connection" is apparently to upload some data files". Should this not be before we can create Dataset record?
Something similar to what we see here, as the sequence of steps.
![image](/uploads/e1579cc87851b5e8995c0892dde824f7/image.png)
Is the need for sdutil completely eliminated? Earlier, we had to upload data file (SegY) by using sdutil to suitable tenant and sub-project.
Perhaps a **companion document** with the **Postman Collection** would help.
@chad earlier mentioned that the DEV team would probably provide a video showing the steps?
Thank you
cc @spoddar , @kimjiman and @ydzenghttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/121[SAST] Client_Privacy_Violation in file queue.ts2023-11-13T15:53:55ZYauhen Shaliou [EPAM/GCP][SAST] Client_Privacy_Violation in file queue.ts**Description**
Method setup at line 42 of \\seismic-store-service\\app\\sdms\\src\\cloud\\shared\\queue.ts sends user information outside the application. This may constitute a Privacy Violation.
<table>
<tr>
<th> </th>
<th>Source</th...**Description**
Method setup at line 42 of \\seismic-store-service\\app\\sdms\\src\\cloud\\shared\\queue.ts sends user information outside the application. This may constitute a Privacy Violation.
<table>
<tr>
<th> </th>
<th>Source</th>
<th>Destination</th>
</tr>
<tr>
<th>File</th>
<td>seismic-store-service/app/sdms/src/cloud/shared/queue.ts</td>
<td>seismic-store-service/app/sdms/src/cloud/providers/azure/insights.ts</td>
</tr>
<tr>
<th>Line number</th>
<td>42</td>
<td>129</td>
</tr>
<tr>
<th>Object</th>
<td>password</td>
<td>log</td>
</tr>
<tr>
<th>Code line</th>
<td>redisOptions.password = cacheParams.KEY;</td>
<td>console.log(data);</td>
</tr>
</table>https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/120[SAST] SSL_Verification_Bypass in file cosmosdb.ts2023-11-13T15:22:49ZYauhen Shaliou [EPAM/GCP][SAST] SSL_Verification_Bypass in file cosmosdb.ts# **Location:**
<table>
<tr>
<th> </th>
<th>
</th>
<th>Destination</th>
</tr>
<tr>
<th>File</th>
<td>
</td>
<td>seismic-store-service/app/sdms/src/cloud/providers/azure/cosmosdb.ts</td>
</tr>
<tr>
<th>Line number</th>
<td>
</td>
<td>...# **Location:**
<table>
<tr>
<th> </th>
<th>
</th>
<th>Destination</th>
</tr>
<tr>
<th>File</th>
<td>
</td>
<td>seismic-store-service/app/sdms/src/cloud/providers/azure/cosmosdb.ts</td>
</tr>
<tr>
<th>Line number</th>
<td>
</td>
<td>67</td>
</tr>
<tr>
<th>Object</th>
<td>
</td>
<td>rejectUnauthorized</td>
</tr>
<tr>
<th>Code line</th>
<td>
</td>
<td>rejectUnauthorized: false</td>
</tr>
</table>
**Description**
\\seismic-store-service\\app\\sdms\\src\\cloud\\providers\\azure\\cosmosdb.ts relies HTTPS requests, in constructor. The rejectUnauthorized parameter, at line 67, effectively disables verification of the SSL certificate trust chain.
JavaScript Explicitly Disabling Certificate Verification var https = require('https'); var options = { hostname: 'domain.com', port: 443, path: '/', method: 'GET', rejectUnauthorized: false; }; options.agent = new https.Agent(options); var req = https.request(options, function(res) { res.on('data', function(d) { handleRequest(d); }); }); req.end();https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/119Rename "IStorage" methods for v42023-10-24T09:09:14ZYan Sushchynski (EPAM)Rename "IStorage" methods for v4Hello,
I noticed that the cloud-storage [interface](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/blob/master/app/sdms-v4/src/cloud/storage.ts?ref_type=heads#L1...Hello,
I noticed that the cloud-storage [interface](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/blob/master/app/sdms-v4/src/cloud/storage.ts?ref_type=heads#L19) has the following methods:
```
createBucket(bucketName: string): Promise<void>;
bucketExists(bucketName: string): Promise<boolean>;
deleteBucket(bucketName: string): Promise<void>;
```
These method names suggest that new buckets are getting created, checked for existence, or deleted within a single data-partition. However, the GC and Baremetal implementations are different -- a data-partition is expected to work with its own pre-created bucket instead of creating new ones. This discrepancy between the method names and their actual functionality could lead to confusion and misunderstanding.
A similar situation exists in the AWS implementation, where comments had to be added to clarify that 'bucketNames' are actually BLOBs, which can be seen [here](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/blob/master/app/sdms-v4/src/cloud/providers/aws/storage.ts?ref_type=heads#L45).
I propose that we consider renaming these methods to more accurately reflect their functionality and create a better alignment with the actual implementation.
Thank you.Diego MolteniYunhua KoglinSacha BrantsMark YanDiego Moltenihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/117[ADR] Advanced filters for dataset search2023-12-04T14:23:23ZAlexandre Gattiker[ADR] Advanced filters for dataset search# Introduction
We need additional filtering support to be able to filter the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/dom...# Introduction
We need additional filtering support to be able to filter the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/merge_requests/891/diffs#fafb01a8314993d61fca390beef912c7813278eb)) operations by metadata fields with more complex expressions than a single key-value match.
# Status
* [x] Initiated
* [x] Proposed
* [x] Under Review
* [ ] Approved
* [ ] Rejected
# Problem statement
The SDMS API `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` currently accepts the following body parameters, among others:
* `search`, a single SQL-like search parameter, for example: `search=name=file%`
* `gtags`, an array of strings matching tags associated with dataset metadata.
The `search` field does not support more than one field, or more than one possible value for a field.
The SDMS API `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/merge_requests/891/diffs#fafb01a8314993d61fca390beef912c7813278eb)) requires a `path` parameter containing `tenantid`, `subprojectid` and `path` but does not support filtering by metadata fields or tags.
For both search and delete, we need to be able to filter by more than one field, or more than one possible value for a field.
Furthermore, we expect a need for more complex filter solutions, such as combining `AND`, `OR` and `NOT` operators. The proposed solution should ideally be extensible to support additional expressions and operators in the future if needed.
# Proposed solution
Add an optional `filter` parameter to the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` API endpoints.
The `search` and `gtags` parameters are to be deprecated.
## Overview
The `filter` parameter can take a payload with a variable format, allowing expressing a simple filter on a single field, as well as logical combinations of filters with arbitrary complexity.
The `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` operation has been selected for extension because:
* Advanced metadata filtering, encompassing select and search functionalities, has already been incorporated into that operation.
* The SDMS API also accepts the `GET` method for the operation with parameters provided in the query string, as a legacy endpoint. The `POST` version of the endpoints has been introduced to address issues related to handling large request parameters, where sending the cursor as a query parameter can lead to oversized requests and subsequent failures.
## Examples
Example value for the `filter` parameter:
```json
{
"and": [
{
"not": {
"property": "gtags",
"operator": "CONTAINS",
"value": "tagA"
}
},
{
"or": [
{
"property": "name",
"operator": "LIKE",
"value": "test.%"
},
{
"property": "name",
"operator": "=",
"value": "dataset.sgy"
}
]
}
]
}
```
This is equivalent to the following pseudo-SQL statement:
```sql
SELECT * FROM datasets d WHERE
NOT (EXISTS (SELECT VALUE 1 FROM t IN d.data.gtags WHERE t = 'tagA')
OR (IS_STRING(d.data.gtags) AND STRINGEQUALS(d.data.gtags, 'tagA')))
AND (
d.name LIKE 'test.%'
OR d.name = 'dataset.sgy'
)
```
## Details
The `filter` parameter can be:
* A **property match filter**:
```json
{
"property": "...",
"operator": "...",
"value": "..."
}
```
The implementation will be extensible with additional keys if needed in the future, e.g. to specify case sensitivity.
* An **`and` or `or` filter**, i.e. an object containing only the key `and` or `or`, of which the value is an array of one or more filters (i.e. a property match filter or an `and`, `or` or `not` filter)
```json
{
"and": [...]
}
```
* A **`not` filter**, i.e. an object containing only the key `not`, of which the value is a filter (i.e. a property match filter or an `and`, `or` or `not` filter)
```json
{
"not": ...
}
```
# Out of scope / limitations
The operations at `GET /utility/ls` and `POST /utility/ls` can also be used for retrieving datasets, but will not be extended with advanced filtering at the moment. That functionality can be added later if required.Diego MolteniDiego Moltenihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/116delete user returns 400 on success instead of 2002023-10-03T17:41:07ZZachary Keirndelete user returns 400 on success instead of 200The delete user from subproject endpoint (observed in m18/AWS) returns 400 even though the delete completes successfully. Then if you run it again it will correctly return 404.The delete user from subproject endpoint (observed in m18/AWS) returns 400 even though the delete completes successfully. Then if you run it again it will correctly return 404.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/115path and sdpath not used consistently, error in /user with parameter path2023-09-27T19:34:47ZZachary Keirnpath and sdpath not used consistently, error in /user with parameter pathThere are I think two issues here. One is documentation in that the yaml doc for /user delete option has 'path' instead of 'sdpath' and I believe it should be 'sdpath'. The other is that when I try to delete someone that does not exist,...There are I think two issues here. One is documentation in that the yaml doc for /user delete option has 'path' instead of 'sdpath' and I believe it should be 'sdpath'. The other is that when I try to delete someone that does not exist, I get 400 instead of 404 in response. This is regardless of whether I try 'path' or 'sdpath' for the parameter.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/114Implement dataset storage for IBM2023-09-20T02:17:59ZMark YanImplement dataset storage for IBMhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/113Implement dataset storage for GCP2023-09-20T02:17:21ZMark YanImplement dataset storage for GCPhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/111[ADR] Synching SDMS V4 datasets in SDMS V32023-09-29T12:05:42ZDiego Molteni[ADR] Synching SDMS V4 datasets in SDMS V3# Introduction
We need a solution for make dataset ingested in SDMS V4 visible and consumed by SDMS V3.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V3 to consume seismic d...# Introduction
We need a solution for make dataset ingested in SDMS V4 visible and consumed by SDMS V3.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V3 to consume seismic dataset entities ingested in SDMS V4, even though the two versions of the system have entirely different architectural logics.
# Status
* [x] Initiated
* [x] Proposed
* [ ] Under Review
* [ ] Approved
* [ ] Rejected
# Problem statement
The Seismic Data Management Service V4 (SDMS V4) stores and manages data types as defined by the Open Subsurface Data Universe (OSDU) Authority. The APIs (Application Programming Interfaces) provide robust data type checks and are fully integrated with the OSDU policy service. The goal is to minimize ambiguity in the authorization model and facilitate straightforward adoption through a consistent usage pattern. In contrast, the V3 version of the service defines, saves, and manages proprietary metadata records, interacts directly with the entitlement service, and organizes records into collections/data-groups named subprojects.
<div align="center">
<br/><img src="/uploads/5e1a58219ca35be9da530b0eba2ed9fa/arch-diagram.png"
alt="sdms-architectural-diagram"
style="display: block; margin: 0 auto" /><br/>
</div>
The key difference between the two versions of the service lies in the form of the record. In the case of the OSDU record adopted by SDMS V4, it is entirely managed by the storage service. However, the V3 metadata has its own format, and to locate a dataset ingested in SDMS V4 via V3, it is necessary to create a V3 proprietary record. The following section will describe how an OSDU record can be translated into a V3 record to enable the synchronization process between the systems
# Proposed solution
Create a new service capable of detecting when a new dataset is registered in SDMS V4 and creating the corresponding record in SDMS V3
## Overview
As previously noted, in SDMS V3, the dataset descriptor has a proprietary structure and is maintained in an internal catalog. However, in SDMS V4, the descriptor is a standard OSDU record managed by the storage service. To make a datasets, ingested in SDMS V4, visible in SDMS V3 we must create a corresponding V3 metadata. This section describes how an SDMS V3 record can be created, using the OSDU record details, to make the ingested dataset in V4 visible in V3
### The SDMS V3 dataset descriptor
```json
{
"id": "the record id <used as key in the service journal catalogue>",
"data": {
"name": "the dataset name",
"tenant": "the tenant name",
"subproject": "the subproject name",
"path": "the dataset virtual folder path",
"acls": {
"admins": "list of entitlement groups with admin rights",
"viewers": "list of entitlement groups with viewer rights"
},
"ltag": "the associated legal tag",
"created-by": "the id of the user who ingested the dataset",
"created_date": "the date and time when the dataset was ingested",
"last_modified_date": "the date and time when the dataset was last modified",
"gcsurl": "the storage uri string where bulks are saved",
"ctag": "a coherency hash tag that changes every time this record is modified",
"readonly": "the access mode level",
"filemetadata": {
"nobjects": "the number of blobs composing the dataset",
"size": "the dataset bulk total size",
"type": "the type of the manifest",
"checksum": "the dataset bulk checksum",
"tier_class": "the dataset storage tier class"
},
"computed_size": "the computed dataset size",
"computed_size_date": "the date and time when the dataset size was computed",
"seismicmeta_guid": "the associated OSDU record id"
}
}
```
### The SDMS V4 record (simplified)
```json
{
"kind": "the osdu dataset kind",
"acl": {
"viewers": "list of entitlement groups with viewer rights",
"owners": "list of entitlement groups with admin rights",
},
"legal": {
"legaltags": "the list of legal tags",
"otherRelevantDataCountries": "the list of data countries",
"status": "the legal status"
},
"data": {
"Name": "the dataset name",
"Description": "the dataset description",
"TotalSize": "the dataset total size",
"DatasetProperties": {
"FileCollectionPath": "the dataset virtual folder path",
"FileSourceInfos": [
{
"FileSource": "the file component source",
"PreloadFilePath": "the file component origin",
"Name": "the file component name",
"FileSize": "the file component size",
"Checksum": "the file component checksum",
"ChecksumAlgorithm": "the checksum algorithm"
}
],
"Checksum": "the dataset checksum"
}
}
```
### ADR symbols definitions
To make it simpler for the reader to understand the examples in the following sections, we define the following symbols:
| Symbols | Description |
| --- | --- |
| RV3 | the SDMS V3 record |
| RV4 | the SDMS V4 record |
| RV4.DatasetProperties | the record_v4.data.DatasetProperties element |
| RV4.FileSourceInfos | the record_v4.data.DatasetProperties.FileSourceInfos element |
### The SDMS V3 record generation in detail
- `RV3.id`
The ID in SDMS V3 is autogenerated based on the values composing the SDMS V3 URI: `tenant`, `subproject`, `path` and `name`.
```python
hash_obj = hashlib.sha512()
hash_obj.update((RV3.data.path + RV3.data.name).encode('utf-8'))
hashed_value = hash_obj.hexdigest()
cosmos_record["id"] = 'ds-' + RV3.data.tenant + '-' + RV3.data.subproject + '-' + hashed_value
```
- `RV3.data.name`
The dataset name.
```python
if 'Name' in RV4.data:
RV3.data.name = RV4.data.Name
elif len(FileSourceInfos) == 1 and 'Name' in FileSourceInfos[0]
RV3.data.name = FileSourceInfos[0].Name
else:
RV3.data.name = RV4.id
```
- `RV3.data.tenant`
The dataset tenant name matches the data-partition-id in the OSDU model. This specific information cannot be automatically detected in a V4 record but can be easily detected by the syncing process .
```python
RV3.data.tenant = data_partition_id
```
- `RV3.data.subproject`
The dataset resource group name (referred to as subproject in SDMS V3) must exist in SDMS V3 with the `access_policy` property set to `dataset`. Essentially, each partition in SDMS V3 should have a default data group where all SDMS V4 datasets can be collected. This required data group can be automatically created by the syncing process. The name of the data group will default to `syncv4`.
```python
RV3.data.subproject = "syncv4"
```
- `RV3.data.path`
The dataset virtual path represents the logical folder structure in the data group (subproject) where the dataset is stored.
```python
RV3.data.path = RV4.DatasetProperties.FileCollectionPath
```
- `RV3.data.acls`
The Access Control List (ACL) defines the list of users with admin and viewer rights. The only difference is that in the SDMS V3 record, the `owners` list is named `admins`, while the `viewers` list has matching names.
```python
RV3.data.acls.admins = RV4.acls.owners
RV3.data.acls.viewers = RV4.acls.viewers
```
- `RV3.data.ltag`
In SDMS V3, legal tag information is represented by a unique value, whereas in SDMS V4, it is represented as a list. To simplify the record composition, we select the first valid legal tag from the V4 record list. If no valid legal tags are found in the V4 record, we should always set an invalid legal tag in V3. If this is not set, V3 will inherit a valid legal tag from the data group, risking the possibility of a non-accessible record in V4 being addressable in V3.
```python
for tag in RV4.legal.legaltags:
if isValid(tag):
RV3.data.ltag = tag
break
if tag is None:
RV3.data.ltag = RV4.legal.legaltags[0]
```
- `RV3.data.created-by`
The user who created/ingested the dataset.
```python
RV3.data['created-by'] = RV4.createUser
```
- `RV3.data.created_date`
The timestamp when the dataset was created/ingested.
```python
RV3.data.created_date = RV4.createTime
```
- `RV3.data.last_modified_date`
The timestamp when the dataset was last modified.
```python
RV3.data.last_modified_date = RV4.modifyTime
```
- `RV3.data.gcsurl`
The storage ID of the container/bucket where dataset bulk files are stored. This value is automatically generated based on the record ID value.
```python
hash_obj = hashlib.sha256()
hash_obj.update(RV4.id.encode('utf-8'))
RV3.data.gcsurl = hash_obj.hexdigest()[:-1]
```
- `RV3.data.ctag`
The Coherency Tag (ctag) is a hash code associated with the dataset descriptor that changes every time the metadata is updated. This property exists only in SDMS V3, and it is autogenerated.
```python
alphabet = string.ascii_letters + string.digits
RV3.data.ctag = ''.join(secrets.choice(alphabet) for _ in range(16))
```
- `RV3.data.readonly`
The `readonly` property defines the dataset's status regarding readability. If set to `false`, the dataset can be accessed in both read and write modes. If set to `true`, the dataset can only be accessed in read mode. In SDMS V4, a dataset cannot be marked as `readonly`, and for this reason, in the generated V3 record, the value will be defaulted to `false`.
```python
RV3.data.readonly = False
```
- `RV3.data.filemetadata`
The `filemetadata`, also known as the dataset manifest, is an object containing information about how the dataset's bulks are stored in the cloud storage resource. The only supported manifest in SDMS V3 is the `GENERIC`, which requires that all objects composing the dataset be saved in sequential order using the `0` to `N-1` naming convention, where `N` is the number of objects. The fields composing the dataset manifest are:
`nobjects`: the number of objects composing the dataset. this value can be computed by counting the number of objects composing the dataset.
`size`: the dataset total size can be computed by summing the sizes of all objects composing the dataset. Alternatively, if it exists, the `RV4.data.TotalSize` can be used, but computing it will provide a better and clearer result.
`type`: the manifest type, with `GENERIC` the only supported.
`checksum`: the dataset checksum.
`tier_class`: the dataset storage tiering class.
```python
blob_list = getBlobClient(connectionString)
size = 0
tier_class = None
objects_num = 0
error = False
for blob in blob_list:
if blob.name != str(count):
error = True
if tier_class == None:
tier_class = blob.blob_tier
objects_num = objects_num + 1
size = size + blob.size
if not error:
RV3.data.filemetadata.type = 'GENERIC'
RV3.data.filemetadata.nobjects = objects_num
RV3.data.filemetadata.size = size
if 'Checksum' in RV4.DatasetProperties:
RV3.data.filemetadata.checksum = RV4.DatasetProperties.Checksum
RV3.data.filemetadata.tier_class = tier
else:
RV3.data.filemetadata = None
```
- `RV3.data.computed_size`
The `computed_size` is generated by SDMS V3 when the `/size` endpoint is triggered. This endpoint calculates the size of the datasets by summing the sizes of all composing objects. This field has been introduced because the dataset filemetadata object is an optional field created by client applications, such as sdapi or sdutil, and can only be trusted by them.
```python
blob_list = getBlobClient(connectionString)
size = 0
for blob in blob_list:
size = size + blob.size
RV3.data.computed_size = size
```
- `RV3.data.computed_size_date`
This is the timestamp of when the dataset size has been computed by SDMS V3.
```python
RV3.data.computed_size_date = str(datetime.datetime.now())
```
- `RV3.data.seismicmeta_guid`
The `seismicmeta_guid` is the ID of a record linked with the SDMS V3 dataset. This can be associated with the SDMS V4 record so all extra properties can be downloaded by consumer applications.
```python
RV3.data.seismicmeta_guid = RV4.id
```
### The Script to validate the proposed conversion
- The script [sync-script.py](/uploads/2421d4b04fe2a6fdd560f1df321e5d36/sync-script.py) is provided with this ADR (for testing purposes only) to demonstrate and validate the synching flow between SDMS V4 and V3:
- Create a random data file of 16MB and compute the checksum
- Fill an OSDU record and register it in SDMS V4
- Upload the 16MB file as 4 objects of 4MB each using the connection string generated via SDMS V4
- Generate an V3 metadata record and register it in SDMS V3
- Ensure the dataset in SDMS V3 can be located after ingestion
- Download all objects using the connection string generated via SDMS V3
- Compare the initial object with the download one to ensure these match
#### Example of an SDMS V4 ingested record
```json
{
"id": "opendes:dataset--FileCollection.SEGY:7fe06451787641c4953a06a63e44967a",
"kind": "osdu:wks:dataset--FileCollection.SEGY: 1.1.0",
"version": 1694519237996696,
"acl": {
"viewers": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.viewer@opendes.domain.com"
],
"owners": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.admin@opendes.domain.com"
]
},
"legal": {
"legaltags": [
"ltag-seistore-test-01"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"modifyUser": "test-user@domain.com",
"modifyTime": "2023-09-07T11:47:18.625Z",
"createUser": "test-user@domain.com",
"createTime": "2023-09-07T07:17:58.443Z",
"data": {
"Name": "data-sync.segy",
"TotalSize": "16777216",
"Description": "SDMS synching test record",
"DatasetProperties": {
"FileCollectionPath": "/f1/f2/f3/",
"FileSourceInfos": [
{
"FileSource": "data-sync.segy",
"Name": "data-sync.segy",
"FileSize": "16777216",
"Checksum": "8ce2025f9b27e3017ab15f15b261d599",
"ChecksumAlgorithm": "MD5"
}
],
"Checksum": "8ce2025f9b27e3017ab15f15b261d599"
}
}
}
```
#### Example of a generated SDMS V3 metadata
```json
{
"id": "ds-opendes-syncv4-c0699ac77bc64a5772ac7f6f455ce5a251e3686d87d26e91df2ecc73e7bfdf4b0a16ac757c2ec227c1a6814d097a0b6b759a01dc52753754a0a18dfaea53c7d0",
"data": {
"name": "data-sync.segy",
"tenant": "opendes",
"subproject": "syncv4",
"path": "/f1/f2/f3/",
"acls": {
"admins": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.admin@opendes.domain.com"
],
"viewers": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.viewer@opendes.domain.com"
]
},
"ltag": "ltag-seistore-test-01",
"created-by": "test-user@domain.com",
"created_date": "2023-09-07T07:17:58.443Z",
"last_modified_date": "2023-09-07T11:47:18.625Z",
"gcsurl": "a5993feef91df715c176452fe1a26d04ca70e88d0ccff268e92cd74c76dde61",
"ctag": "9STTAfiKl4iukKbp",
"readonly": "false",
"filemetadata": {
"nobjects": 4,
"size": 16777216,
"type": "GENERIC",
"checksum": "8ce2025f9b27e3017ab15f15b261d599",
"tier_class": "Hot"
},
"computed_size": 16777216,
"computed_size_date": "2023-09-12 13:47:45.877142",
"seismicmeta_guid": "opendes:dataset--FileCollection.SEGY:7fe06451787641c4953a06a63e44967a"
}
}
```
### SDMS V4 to V3 Synching Automation
The preceding section explains the process of creating a metadata descriptor for SDMS V3 using an OSDU record. This metadata descriptor enables access to a dataset ingested in SDMS V4 through SDMS V3.
In order to automate the process, we will deploy a new service called the `sdms-sync-service`, which will be responsible for generating an SDMS V3 record every time a new dataset is registered in SDMS V4. When a dataset is registered in SDMS V4, a message will be pushed into a Redis queue `insert-synch-v4:{record-id}:{partition}:{other-required-params}`. The new service will consume the messages from the Redis queue and initiate the synching process:
- retrieve the OSDU record from storage service
- generates the corresponding SDMS V3 metadata descriptor
- saves the generated metadata in the SDMS V3 journal.
<div align="center">
<br/><img src="/uploads/b2d6eb24b28516feb0908e5ef7232a2e/sdms-sync-service.png"
alt="sdms-sync-service"
style="display: block; margin: 0 auto" /><br/>
</div>
### Details
- If a dataset is patched in SDMS V4, the service should push an `insert` message into the Redis queue:
- If the previous `insert` message is still in the queue (not yet consumed by the sync service), the existing entry will be overwritten in the queue, and the sync service will create the updated one.
- If the previous version was already synced, when the new message is consumed, the updated record will be created, and because the generated key is identical, it will overwrite the existing record in the journal.
- if a dataset is delete in SDMS V4 the service should push a `delete` message in the Redis queue.
- When the delete message is consumed, the sync service will generate only the V3 record key and remove the entry from the journal.
- If the `insert` message was still not consumed from the queue, when the sync service consume it it should check if a `delete` message is also present for the same record. In case this is located in the queue, the sync service will skip the sync process and remove both entry `insert` and `delete` from the Redis queue.
### Limitations
When a dataset is registered in V4 via a client app, the record is created instantaneously, while uploading the bulk data into the storage resource takes longer. If the `insert` message is consumed before the bulk data is uploaded, the file manifest cannot be computed due to missing objects. To address this issue, we can enable a background process in the `sync-service` that loops over the created SDMS V3 records and updates the manifest in cases where it does not exist or when the last modified time in the corresponding SDMS V4 record is greater than the one reported in the V3 entry. This approach should be re-discussed with the community to find an optimal strategy to apply.M22 - Release 0.25Sacha BrantsMark YanSacha Brantshttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/109Unsupported Feature in Dataset LS Get Endpoint Causing Test Failures on AWS a...2023-09-06T20:02:06ZPratiksha ShedgeUnsupported Feature in Dataset LS Get Endpoint Causing Test Failures on AWS and AnthosA new feature has been introduced for the dataset LS get endpoint, comprising the Search (to select a single SQL-like search parameter) and Select (to choose multiple fields for retrieval) query parameters. The API is expected to return ...A new feature has been introduced for the dataset LS get endpoint, comprising the Search (to select a single SQL-like search parameter) and Select (to choose multiple fields for retrieval) query parameters. The API is expected to return a list of datasets based on the search and select query parameters. However, AWS and Anthos do not support this new feature for this endpoint, leading to test failures during pipeline runs.
Pipeline runs:
AWS: https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/2200880
Anthos: https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/2200882https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/106The getBloctkSizes services takes > 30 seconds to complete for a 20 GB file2023-08-29T15:09:09ZMichaelThe getBloctkSizes services takes > 30 seconds to complete for a 20 GB fileWe uploaded a 20 GB file to Azure pre-shipping Seismic DDMS environment.
When calling the getBlockSizes() method from the sdapi library, the call took over 30 seconds to complete.
We are worried about how long it takes to execute this ...We uploaded a 20 GB file to Azure pre-shipping Seismic DDMS environment.
When calling the getBlockSizes() method from the sdapi library, the call took over 30 seconds to complete.
We are worried about how long it takes to execute this method since it affects the user experience when trying to visualize data from these files. We are also concerned because we want to support larger files (100 GB + in size).
The file was uploaded to the following sd path: sd://opendes/michaelm19/ST0202R08_PZ_PrSDM_CIP_gathers_in_PP_Time.segyhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/104'path' parameter should be optional not required2023-08-29T15:07:30ZZachary Keirn'path' parameter should be optional not requiredThe API docs all state that 'path' is a required field. It looks like it is actually optional and I am told by Mark Yan that the service inserts a "/" if it is not provided. Current collections for testing this all set the 'path' paramet...The API docs all state that 'path' is a required field. It looks like it is actually optional and I am told by Mark Yan that the service inserts a "/" if it is not provided. Current collections for testing this all set the 'path' parameter to an empty string in a pre-request script. But they could be made more clear by just not entering this parameter at all. ALSO, would like to see example of when setting the path is needed and how it should be set under that circumstance. From reading the API doc, it seems to suggest that you would enter the path to the segy file, but if you do that you get following error that seems to insert slashes at start and end of the provided 'path' parameter: The ‘path’ parameter /sd://osdu/testtenant2/ST0202R08_PS_PSDM_RAW_PP_TIME.MIG_RAW.POST_STACK.3D.JS-017534.segy/ is in a wrong format.It should match the regex expression ^[/A-Za-z0-9_.-]*$. In this case, 'path' was set in the params as "sd://osdu/testtenant2/ST0202R08_PS_PSDM_RAW_PP_TIME.MIG_RAW.POST_STACK.3D.JS-017534.segy"Mark YanMark Yanhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/102When deleting a subproject, blobs are deleted individually before the contain...2023-07-20T11:20:30ZMaggie SalakWhen deleting a subproject, blobs are deleted individually before the container is removedSteps to reproduce:
* Call the endpoint delete a subproject (DELETE /subproject/tenant/{tenantid}/subproject/{subprojectid})
* The blob container linked to the subproject should be deleted. In the current implementation all blobs inside...Steps to reproduce:
* Call the endpoint delete a subproject (DELETE /subproject/tenant/{tenantid}/subproject/{subprojectid})
* The blob container linked to the subproject should be deleted. In the current implementation all blobs inside the container are first deleted individually, before the container itself is removed. See the relevant [code section](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/blob/master/app/sdms/src/cloud/providers/azure/seistore.ts#L74).
Suggestions:
* Remove the blob deletion from the implementation and only delete the entire container.