Seismic issueshttps://community.opengroup.org/groups/osdu/platform/domain-data-mgmt-services/seismic/-/issues2023-12-04T14:23:23Zhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/117[ADR] Advanced filters for dataset search2023-12-04T14:23:23ZAlexandre Gattiker[ADR] Advanced filters for dataset search# Introduction
We need additional filtering support to be able to filter the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/dom...# Introduction
We need additional filtering support to be able to filter the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/merge_requests/891/diffs#fafb01a8314993d61fca390beef912c7813278eb)) operations by metadata fields with more complex expressions than a single key-value match.
# Status
* [x] Initiated
* [x] Proposed
* [x] Under Review
* [ ] Approved
* [ ] Rejected
# Problem statement
The SDMS API `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` currently accepts the following body parameters, among others:
* `search`, a single SQL-like search parameter, for example: `search=name=file%`
* `gtags`, an array of strings matching tags associated with dataset metadata.
The `search` field does not support more than one field, or more than one possible value for a field.
The SDMS API `PUT /operation/bulk-delete` (added in [!891](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/merge_requests/891/diffs#fafb01a8314993d61fca390beef912c7813278eb)) requires a `path` parameter containing `tenantid`, `subprojectid` and `path` but does not support filtering by metadata fields or tags.
For both search and delete, we need to be able to filter by more than one field, or more than one possible value for a field.
Furthermore, we expect a need for more complex filter solutions, such as combining `AND`, `OR` and `NOT` operators. The proposed solution should ideally be extensible to support additional expressions and operators in the future if needed.
# Proposed solution
Add an optional `filter` parameter to the `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` and `PUT /operation/bulk-delete` API endpoints.
The `search` and `gtags` parameters are to be deprecated.
## Overview
The `filter` parameter can take a payload with a variable format, allowing expressing a simple filter on a single field, as well as logical combinations of filters with arbitrary complexity.
The `POST /dataset/tenant/{tenantid}/subproject/{subprojectid}` operation has been selected for extension because:
* Advanced metadata filtering, encompassing select and search functionalities, has already been incorporated into that operation.
* The SDMS API also accepts the `GET` method for the operation with parameters provided in the query string, as a legacy endpoint. The `POST` version of the endpoints has been introduced to address issues related to handling large request parameters, where sending the cursor as a query parameter can lead to oversized requests and subsequent failures.
## Examples
Example value for the `filter` parameter:
```json
{
"and": [
{
"not": {
"property": "gtags",
"operator": "CONTAINS",
"value": "tagA"
}
},
{
"or": [
{
"property": "name",
"operator": "LIKE",
"value": "test.%"
},
{
"property": "name",
"operator": "=",
"value": "dataset.sgy"
}
]
}
]
}
```
This is equivalent to the following pseudo-SQL statement:
```sql
SELECT * FROM datasets d WHERE
NOT (EXISTS (SELECT VALUE 1 FROM t IN d.data.gtags WHERE t = 'tagA')
OR (IS_STRING(d.data.gtags) AND STRINGEQUALS(d.data.gtags, 'tagA')))
AND (
d.name LIKE 'test.%'
OR d.name = 'dataset.sgy'
)
```
## Details
The `filter` parameter can be:
* A **property match filter**:
```json
{
"property": "...",
"operator": "...",
"value": "..."
}
```
The implementation will be extensible with additional keys if needed in the future, e.g. to specify case sensitivity.
* An **`and` or `or` filter**, i.e. an object containing only the key `and` or `or`, of which the value is an array of one or more filters (i.e. a property match filter or an `and`, `or` or `not` filter)
```json
{
"and": [...]
}
```
* A **`not` filter**, i.e. an object containing only the key `not`, of which the value is a filter (i.e. a property match filter or an `and`, `or` or `not` filter)
```json
{
"not": ...
}
```
# Out of scope / limitations
The operations at `GET /utility/ls` and `POST /utility/ls` can also be used for retrieving datasets, but will not be extended with advanced filtering at the moment. That functionality can be added later if required.Diego MolteniDiego Moltenihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/215The sd protocol is failing for IBM2023-10-27T15:17:11ZAnuj GuptaThe sd protocol is failing for IBMThe sd protocol is failing for IBM while trying to call `vds = openvds.open(url, con)` is resulting in 404 error and seems the characters after `/` is getting escaped/skipped
If path is `ss-dev-seismic-dh2cqj2dwyr3tsz9/f013db48-47f5-430...The sd protocol is failing for IBM while trying to call `vds = openvds.open(url, con)` is resulting in 404 error and seems the characters after `/` is getting escaped/skipped
If path is `ss-dev-seismic-dh2cqj2dwyr3tsz9/f013db48-47f5-430b-a10e-c5f6622712d2 `
the bucket name is : ss-dev-seismic-dh2cqj2dwyr3tsz9
subpath/key : f013db48-47f5-430b-a10e-c5f6622712d2
where as the subpath/key is:
`013db48-47f5-430b-a10e-c5f6622712d2` (~~f~~013db48-47f5-430b-a10e-c5f6622712d2)2023-10-13https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/214Job Failed #2191076. Openvds-ingestion image for tag 3.3.0 not created2023-10-11T15:13:35ZAndrei Skorkin [EPAM / GCP]Job Failed #2191076. Openvds-ingestion image for tag 3.3.0 not createdJob [#2191076](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/jobs/2191076) failed for 50c3f90f370390cfd5d23defeb7603bbfa01374c
During the release cycle, we automatically change the latest ima...Job [#2191076](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/jobs/2191076) failed for 50c3f90f370390cfd5d23defeb7603bbfa01374c
During the release cycle, we automatically change the latest image tag to the latest one presented here for **openvds-ingestion** image. In this case it was **3.3.0** ( https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-gcp-provisioning/-/blob/release/0.23/helm/osdu-infra-baremetal/values.yaml?ref_type=heads#L182 ). But due to the failure of the mentioned job, image does not exist and we need to manually fix the wrong version in the values.yaml file to make a deployment.
Could you please check this?
How to overcome similar problems in the future?
Thanks
CC: @Yauhen_Shaliou @Yan_SushchynskiMorten OfstadMorten Ofstadhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/116delete user returns 400 on success instead of 2002023-10-03T17:41:07ZZachary Keirndelete user returns 400 on success instead of 200The delete user from subproject endpoint (observed in m18/AWS) returns 400 even though the delete completes successfully. Then if you run it again it will correctly return 404.The delete user from subproject endpoint (observed in m18/AWS) returns 400 even though the delete completes successfully. Then if you run it again it will correctly return 404.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/213NEWBIE - installation on ARM 64 Linux graviton2024-02-01T07:16:11ZKlaas KosterNEWBIE - installation on ARM 64 Linux gravitonFollowing the instructions, I executed:
1) cmake ..
2) make -j8
3) make install
No errors or warnings are generated, and five executables are placed in Dist/OpenVDS/bin.
Two issues:
1) The README file states that ./SEGYImport should sho...Following the instructions, I executed:
1) cmake ..
2) make -j8
3) make install
No errors or warnings are generated, and five executables are placed in Dist/OpenVDS/bin.
Two issues:
1) The README file states that ./SEGYImport should show the Wavelet Compression option, but it does not.
2) I cannot find the .whl file anywhere that would allow me to use 'pip install' to get Python to work with OpenVDS.
What did I do wrong or forget?https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/212VDSCopy hanging when uploading to Seismic DDMS2023-10-23T11:39:11Zvinicius Vicente Silva RosaVDSCopy hanging when uploading to Seismic DDMSI am attempting to upload a local VDS file (1.5TB) to a SD Path, and after approximately an hour, there is no visible progress in the file upload, creating the impression that the process is stalled. No error messages are being displayed...I am attempting to upload a local VDS file (1.5TB) to a SD Path, and after approximately an hour, there is no visible progress in the file upload, creating the impression that the process is stalled. No error messages are being displayed. I suspect it may be related to the token refresh.
We are using the command line bellow:
OSDU/ADME M16
lIB: VDSCopy - OpenVDS+ 3.3.0 installed on Linux
```bash
VDSCopy -a 01 -a 02 -a 12 --tolerance=1.0 --compression-method=Wavelet -d 'sdAuthorityUrl=
https://{HOST}.energy.azure.com/seistore-svc/api/v3;authTokenUrl=https://login.microsoftonline.com/{TENANT}/oauth2/v2.0/token/;client_id={APP_ID};client_secret={APP_SECRET};scopes={APP_ID}/.default;'
'/local_disk0/vds/FILE.vds' 'sd://{TENANT}/{SUBPROJECT}/dataset_name.vds'
```
The intention of the command above is to authenticate using ClientID and ClientSecrect.
The upload completes successfully when the file is processed within an hour or less.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/211Any advices for saving vds file?2023-09-29T09:15:42Znanting liuAny advices for saving vds file?- SEGYImport [OPTION...] <input file>
- use _–url <string>_ saving vds file to cloud environment, then vds file will be splited into several files(Dimensions_012LOD0、VolumeDataLayout...etc.)
- use _–vdsfile <string>_ saving vds file to l...- SEGYImport [OPTION...] <input file>
- use _–url <string>_ saving vds file to cloud environment, then vds file will be splited into several files(Dimensions_012LOD0、VolumeDataLayout...etc.)
- use _–vdsfile <string>_ saving vds file to local file system,then vds file will be completed with a file ending in .vds(test.vds)
- when i requst data from a file ending in .vds it will be so fast.
- And any difference from -url to -vdsfile ? its all use to set path of output VDS file. I would like to know how to use these two parameters correctly to improve the efficiency in querying data.
- thank u so much.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/115path and sdpath not used consistently, error in /user with parameter path2023-09-27T19:34:47ZZachary Keirnpath and sdpath not used consistently, error in /user with parameter pathThere are I think two issues here. One is documentation in that the yaml doc for /user delete option has 'path' instead of 'sdpath' and I believe it should be 'sdpath'. The other is that when I try to delete someone that does not exist,...There are I think two issues here. One is documentation in that the yaml doc for /user delete option has 'path' instead of 'sdpath' and I believe it should be 'sdpath'. The other is that when I try to delete someone that does not exist, I get 400 instead of 404 in response. This is regardless of whether I try 'path' or 'sdpath' for the parameter.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/210Error uploading VDS into SD Path using OpenVDS+2023-11-14T22:11:13ZJuliana Fernandesjuliana.fernandes@iesbrazil.com.brError uploading VDS into SD Path using OpenVDS+Hello,
I`m trying to upload a local VDS into a SD Path in AWS M20 pre-shipping and get an SDMS error (wrong location).
Some values where provided by e-mail, so I will not paste here, but if you need to test just let me know.
The command...Hello,
I`m trying to upload a local VDS into a SD Path in AWS M20 pre-shipping and get an SDMS error (wrong location).
Some values where provided by e-mail, so I will not paste here, but if you need to test just let me know.
The command I'm using is:
```
VDSCopy.exe -d "SdAuthorityUrl=https://prsh.testing.preshiptesting.osdu.aws/api/seismic-store/v3;SdApiKey=ABC;AuthTokenUrl={{received_by_email}};client_id={{received_by_email}};client_secret={{received_by_email}};grant_type=refresh_token;refresh_token={{generated_in_the_login}};LegalTag=osdu-public-usa-dataset-1;scopes=openid email;Region=us-east-2" E:\Juliana\osdu\osdu_test\ST0202R08_PS_PSDM_RAW_PP_TIME_MIG_RAW_POST_STACK_3D_JS_017534_tol1_JFA.vds sd://osdu/vdstestsjfa/ST0202R08_PS_PSDM_RAW_PP_TIME_MIG_RAW_POST_STACK_3D_JS_017534_tol1_JFA.vds
```
And the error I'm getting is:
```
[Could not create VDS sd://osdu/vdstestsjfa/test/ST0202R08_PS_PSDM_RAW_PP_TIME_MIG_RAW_POST_STACK_3D_JS_017534_tol1_JFA.vds] Error on uploading VolumeDataLayout object: Http error response: 301 -> https://psosdu-shared-seismicddms-20230814174725984500000004.s3.us-east-1.amazonaws.com/3o7c5j88s1ko0oyg/2b9b212b-21b5-4ccf-aaed-67485c113ae4/VolumeDataLayout: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
```
Seems to be accessing the wrong location since the instance is located at us-east-2.
Regards,
Juliana.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/209Adding CRS to a VDS generated by Openvds+2023-11-16T12:21:10ZJuliana Fernandesjuliana.fernandes@iesbrazil.com.brAdding CRS to a VDS generated by Openvds+Hello,
I was taking a look into the doccumentation in order to add CRS to the VDS I'm generating with Openvds+.
In the doccumentation I saw the command "–crs-wkt <string>". The WKT is a Well-known Text and seems to be a geographical c...Hello,
I was taking a look into the doccumentation in order to add CRS to the VDS I'm generating with Openvds+.
In the doccumentation I saw the command "–crs-wkt <string>". The WKT is a Well-known Text and seems to be a geographical coordinate. There is a way to add a UTM coordinate to the data?
Regards,
Julianahttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/208Multiple queries of data will be much slower.2023-09-27T08:35:56Znanting liuMultiple queries of data will be much slower.Multiple queries of data at different depths then the response time of results will be much slower(from the first few tens of milliseconds to seven seconds)
_VolumeDataRequestDouble data= accessManager.requestVolumeSubsetDouble(managedBu...Multiple queries of data at different depths then the response time of results will be much slower(from the first few tens of milliseconds to seven seconds)
_VolumeDataRequestDouble data= accessManager.requestVolumeSubsetDouble(managedBuffer.getByteBuffer(), Dimensions_012, 0, 0, min, max);_
_data.waitForCompletion();_
_getdata()_
_managedBuffer.close();_
_vds.close();_
i found Loop call _requestVolumeSubsetDouble_ then each response time will be longer.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/207Is there any way to get min value of sample or max value?2023-09-21T10:03:07Znanting liuIs there any way to get min value of sample or max value?i use `VolumeDataAccessManager.requestVolumeSubsetDouble()` to get sample data, and then i want to get the maximum and minimum values in all data, so i have to write code to compare them, is there any usable method to invoke to get that ...i use `VolumeDataAccessManager.requestVolumeSubsetDouble()` to get sample data, and then i want to get the maximum and minimum values in all data, so i have to write code to compare them, is there any usable method to invoke to get that two values in _VolumeDataLayout_ or _VolumeDataAccessManager_?https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/206SEGYImport will create a ramdon directory on my cloud environment?2023-09-20T18:51:48Znanting liuSEGYImport will create a ramdon directory on my cloud environment?l already defined a subpath with command("s3://vds/1704505448285212672"),but i found a record in the log("Successfully imported into s3://vds/1704505448285212672/CD352FB47324BA63"),how can I avoid this situation?l already defined a subpath with command("s3://vds/1704505448285212672"),but i found a record in the log("Successfully imported into s3://vds/1704505448285212672/CD352FB47324BA63"),how can I avoid this situation?https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/114Implement dataset storage for IBM2023-09-20T02:17:59ZMark YanImplement dataset storage for IBMhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/113Implement dataset storage for GCP2023-09-20T02:17:21ZMark YanImplement dataset storage for GCPhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/205fail to copy file from cloud server2023-09-19T10:20:18Znanting liufail to copy file from cloud serveri try to read vds from minio(a open-source, S3 compatible object store)
at first step,i try to read a vds file with _OpenVDS.open()_,but failed,
exception is "Error on downloading VolumeDataLayout object: Http error response: 404 -> http...i try to read vds from minio(a open-source, S3 compatible object store)
at first step,i try to read a vds file with _OpenVDS.open()_,but failed,
exception is "Error on downloading VolumeDataLayout object: Http error response: 404 -> https://endpoint/bucket-name/test.vds/VolumeDataLayout: The specified key does not exist.".
then,i realized that _open()_ can not read a vds file directly, cause the file uploaded manually.
and the second step,l try to use VDSCopy to copy the VDS file to the cloud environment,still fail! with error "Error on uploading VolumeDataLayout object: unexpected AWS signing failure",here is my command `VDSCopy.exe E:\PPCoef.vds s3://endpoint/bucket-name/testVDS -d "Region=us-west-rack-2;SecretKey=xxx;SecretAccessKey=xxx"`,my SecretKey&SecretAccessKey is correct,but l dont know why print this...
Could you please help me figure out how to deal with this situation?https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-sdutil/-/issues/29Add verification that the Seismic's cloud provider matches the one from the c...2023-10-26T12:19:36ZYan Sushchynski (EPAM)Add verification that the Seismic's cloud provider matches the one from the config.yamlHello,
Recently we introduced a new implementation of Seismic for Google cloud. It mostly follows almost the same workflow as the previous google implementation, but there are some cruicial differences, and we got unpredicted results. A...Hello,
Recently we introduced a new implementation of Seismic for Google cloud. It mostly follows almost the same workflow as the previous google implementation, but there are some cruicial differences, and we got unpredicted results. As far as I remember, the service's respnses have information about cloud providers.
What if some extra checks are added?
ThanksDiego MolteniMark YanDiego Moltenihttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/112[ADR] Synching SDMS V3 datasets in SDMS V42024-02-28T07:31:26ZDiego Molteni[ADR] Synching SDMS V3 datasets in SDMS V4# Introduction
We need a solution for make dataset ingested in SDMS V3 visible and consumed by SDMS V4.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V4 to consume seismic d...# Introduction
We need a solution for make dataset ingested in SDMS V3 visible and consumed by SDMS V4.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V4 to consume seismic dataset entities ingested in SDMS V3 via client applications, even though the two versions of the system have entirely different architectural logics.
# Status
* [x] Initiated
* [x] Proposed
* [ ] Under Review
* [ ] Approved
* [ ] Rejected
# Problem statement
The Seismic Data Management Service V4 (SDMS V4) stores and manages data types as defined by the Open Subsurface Data Universe (OSDU) Authority. The APIs (Application Programming Interfaces) provide robust data type checks and are fully integrated with the OSDU policy service. The goal is to minimize ambiguity in the authorization model and facilitate straightforward adoption through a consistent usage pattern. In contrast, the V3 version of the service defines, saves, and manages proprietary metadata records, interacts directly with the entitlement service, and organizes records into collections/data-groups named subprojects.
<div align="center">
<br/><img src="/uploads/5e1a58219ca35be9da530b0eba2ed9fa/arch-diagram.png"
alt="sdms-architectural-diagram"
style="display: block; margin: 0 auto;"/><br/>
</div>
The key difference between the two versions of the service lies in the way of how the cloud storage URI is generated. In SDMS V4 this is generated starting from the record-id value while in SDMS V3 the generated URI is a random UUID.
# Proposed solution
Update SDMS V4 by adding the capability to correctly retrieve the storage location for the dataset's bulk data if the dataset was ingested via SDMS V3.
## Scenarios
When a dataset is ingested in SDMS V3 from a seismic application, the latter also creates an OSDU Bulk record linked to a Work Product Component, as shown in the following diagram:
<div align="center">
<br/><img src="/uploads/3d73191098963a80675c2ed6e96472cc/image.png"
alt="sdms-architectural-diagram"
style="display: block; margin: 0 auto; height: 30%; width: 30%" /><br/>
</div>
The seismic applications saves the SDMS V3 URI (also known as `sdapth`) in the `FileSourceInfo` property of the created OSDU Bulk record. This is done to later facilitate communication of the URI to SDMS V3 for retrieving the storage connection string required to access the dataset's bulk data.
### Example of SDMS V3 dataset metadata
```json
{
"name": "test-data.zgy",
"tenant": "partition",
"subproject": "subproject",
"path": "/",
"ltag": "test-legal",
"created_by": "test-user@slb.com",
"last_modified_date": "Tue Sep 12 2023 11:04:29 GMT+0000 (Coordinated Universal Time)",
"created_date": "Tue Sep 12 10:54:10 GMT+0000 (Coordinated Universal Time)",
"gcsurl": "ss-weu-xkz32bjwg2425gn/bdf36c8a-3c62-3151-12b7-227af4727520",
"ctag": "sMTz0oWeId1nOnrx",
"readonly": true,
"sbit": null,
"sbit_count": 0,
"filemetadata": {
"type": "GENERIC",
"size": 1544552448,
"nobjects": 47
},
"seismicmeta_guid": "partition:work-product-component--SeismicTraceData:326bac9a-1fb2-5c73-9c64-6ca122c5025a",
"access_policy": "uniform"
}
```
### Example of OSDU storage associated Work Product Component
```json
{
"id": "partition:work-product-component--SeismicTraceData:326bac9a-1fb2-5c73-9c64-6ca122c5025",
"kind": "osdu:wks:work-product-component--SeismicTraceData:1.3.0",
"version": 1685099234631439,
"acl": {
"viewers": [
"data.test@domain.slb.com"
],
"owners": [
"data.test@domain.com"
]
},
"legal": {
"legaltags": [
"test-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"data": {
"BinGridID": "partition:work-product-component--SeismicBinGrid:2a714f2b12aa346d16a08c5a2f4e157e:",
"Datasets": [
"partition:dataset--FileCollection.Slb.OpenZGY:1de532c2-4d1b-5316-ba4a-422342321d55"
],
"DDMSDatasets": [
"urn:dataset--FileCollection.Slb.OpenZGY:1de532c2-4d1b-5316-ba4a-422342321d55"
],
"Name": "test-data.zgy",
"Source": "osdu",
"SubmitterName": "test-user@domain.com"
},
"createUser": "test-user@domain.com",
"createTime": "2023-09-12T11:04:30.321Z",
"modifyUser": "test-user@domain.com",
"modifyTime": "2023-09-12T18:09:12.703Z"
}
```
### Example of OSDU storage associated File Collection
```json
{
"id": "partition:dataset--FileCollection.Slb.OpenZGY:1de532c2-4d1b-5316-ba4a-422342321d55",
"version": "4426199321664216",
"kind": "osdu:wks:dataset--FileCollection.Slb.OpenZGY:1.0.0",
"acl": {
"viewers": [
"data.test@domain.slb.com"
],
"owners": [
"data.test@domain.com"
]
},
"legal": {
"legaltags": [
"test-legal"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "test-user@domain.com",
"createTime": "2023-09-12T11:04:02.705Z",
"data": {
"Endian": "BIG",
"SEGYRevision": "rev 1",
"TotalSize": "1544552448",
"Name": "test-data.zgy",
"DatasetProperties": {
"FileCollectionPath": "sd://tenant/subproject/",
"FileSourceInfos": [
{
"FileSource": "test-data.zgy",
"Name": "test-data.zgy",
"FileSize": "1544552448",
}
]
}
}
}
```
## Proposed Solution
To enable applications to access bulk datasets ingested in SDMS V3 through SDMS V4, we need to update the mechanism in SDMS V4 for retrieving the correct storage URI associated with the Bulk record. This update is necessary to generate a valid connection string for accessing the bulk data.
When a Bulk record is created, the SDMS V3 URI (also known as 'sdapth') is typically saved in the `FileCollectionPath` and `FileSource` properties. In the most common scenarios, the `sd://tenant/subproject/path` portion of the URI is stored in the `FileCollectionPath` property, while the URI's name is stored in the `FileSource` property.
When a connection access string is requested for a Bulk record through SDMS V4, the service should detect if the record's file source type refers to a V3 dataset's URI. If this last case, the service should then:
1. extract the `subproject` name from the `FileCollectionPath`
```python
subproject = record.data.DatasetProperties.FileCollectionPath.replace("sd://", "").split('/')[1]
```
2. extract the `path` from the `FileCollectionPath`
```python
subproject = (record.data.DatasetProperties.FileCollectionPath.replace("sd://", "").split('/')[2:]).replace("//", "/")
```
3. extract the `name` from the `FileSource`
```python
name = record.data.DatasetProperties.FileSourceInfos[0].FileSource
```
4. retrieve the storage URL from the V3 journal
```sql
SELECT c.data.gcsurl
FROM c
WHERE
c.data.subproject="{subproject}"
AND c.data.path="{path}"
AND c.data.name="{name}"
```
5. generate the connection string using the retrieved storage URL
```python
storage_client = StorageClient("{storage-url}")
return storage_client.getConnectionString()
```
#### Notes
Seismic applications use different approaches to save the SDMS V3 URI in the Bulk record, and all these cases should be considered:
1. The sd://tenant/subproject/path is saved in the `FileCollectionPath`, and the name is saved in `FileSource`.
2. The full sd://tenant/subproject/path/name URI is saved in both `FileCollectionPath` and `FileSource`.
3. The sd://tenant/subproject/path URI is saved in `FileCollectionPath`, and the name in `FileSource`, but this latter starts with the ./ special character (which should be removed).
### Limitations
Applications that do not match the described flow should we reviewed with the application owner before defining the right strategy to enable the synchronization of datasets ingested in SDMS V3 with SDSM V4.M22 - Release 0.25Sacha BrantsSneha PoddarSacha Brantshttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/111[ADR] Synching SDMS V4 datasets in SDMS V32023-09-29T12:05:42ZDiego Molteni[ADR] Synching SDMS V4 datasets in SDMS V3# Introduction
We need a solution for make dataset ingested in SDMS V4 visible and consumed by SDMS V3.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V3 to consume seismic d...# Introduction
We need a solution for make dataset ingested in SDMS V4 visible and consumed by SDMS V3.
The purpose of this ADR is to describes how to enable a synchronization mechanism that allows users of SDMS V3 to consume seismic dataset entities ingested in SDMS V4, even though the two versions of the system have entirely different architectural logics.
# Status
* [x] Initiated
* [x] Proposed
* [ ] Under Review
* [ ] Approved
* [ ] Rejected
# Problem statement
The Seismic Data Management Service V4 (SDMS V4) stores and manages data types as defined by the Open Subsurface Data Universe (OSDU) Authority. The APIs (Application Programming Interfaces) provide robust data type checks and are fully integrated with the OSDU policy service. The goal is to minimize ambiguity in the authorization model and facilitate straightforward adoption through a consistent usage pattern. In contrast, the V3 version of the service defines, saves, and manages proprietary metadata records, interacts directly with the entitlement service, and organizes records into collections/data-groups named subprojects.
<div align="center">
<br/><img src="/uploads/5e1a58219ca35be9da530b0eba2ed9fa/arch-diagram.png"
alt="sdms-architectural-diagram"
style="display: block; margin: 0 auto" /><br/>
</div>
The key difference between the two versions of the service lies in the form of the record. In the case of the OSDU record adopted by SDMS V4, it is entirely managed by the storage service. However, the V3 metadata has its own format, and to locate a dataset ingested in SDMS V4 via V3, it is necessary to create a V3 proprietary record. The following section will describe how an OSDU record can be translated into a V3 record to enable the synchronization process between the systems
# Proposed solution
Create a new service capable of detecting when a new dataset is registered in SDMS V4 and creating the corresponding record in SDMS V3
## Overview
As previously noted, in SDMS V3, the dataset descriptor has a proprietary structure and is maintained in an internal catalog. However, in SDMS V4, the descriptor is a standard OSDU record managed by the storage service. To make a datasets, ingested in SDMS V4, visible in SDMS V3 we must create a corresponding V3 metadata. This section describes how an SDMS V3 record can be created, using the OSDU record details, to make the ingested dataset in V4 visible in V3
### The SDMS V3 dataset descriptor
```json
{
"id": "the record id <used as key in the service journal catalogue>",
"data": {
"name": "the dataset name",
"tenant": "the tenant name",
"subproject": "the subproject name",
"path": "the dataset virtual folder path",
"acls": {
"admins": "list of entitlement groups with admin rights",
"viewers": "list of entitlement groups with viewer rights"
},
"ltag": "the associated legal tag",
"created-by": "the id of the user who ingested the dataset",
"created_date": "the date and time when the dataset was ingested",
"last_modified_date": "the date and time when the dataset was last modified",
"gcsurl": "the storage uri string where bulks are saved",
"ctag": "a coherency hash tag that changes every time this record is modified",
"readonly": "the access mode level",
"filemetadata": {
"nobjects": "the number of blobs composing the dataset",
"size": "the dataset bulk total size",
"type": "the type of the manifest",
"checksum": "the dataset bulk checksum",
"tier_class": "the dataset storage tier class"
},
"computed_size": "the computed dataset size",
"computed_size_date": "the date and time when the dataset size was computed",
"seismicmeta_guid": "the associated OSDU record id"
}
}
```
### The SDMS V4 record (simplified)
```json
{
"kind": "the osdu dataset kind",
"acl": {
"viewers": "list of entitlement groups with viewer rights",
"owners": "list of entitlement groups with admin rights",
},
"legal": {
"legaltags": "the list of legal tags",
"otherRelevantDataCountries": "the list of data countries",
"status": "the legal status"
},
"data": {
"Name": "the dataset name",
"Description": "the dataset description",
"TotalSize": "the dataset total size",
"DatasetProperties": {
"FileCollectionPath": "the dataset virtual folder path",
"FileSourceInfos": [
{
"FileSource": "the file component source",
"PreloadFilePath": "the file component origin",
"Name": "the file component name",
"FileSize": "the file component size",
"Checksum": "the file component checksum",
"ChecksumAlgorithm": "the checksum algorithm"
}
],
"Checksum": "the dataset checksum"
}
}
```
### ADR symbols definitions
To make it simpler for the reader to understand the examples in the following sections, we define the following symbols:
| Symbols | Description |
| --- | --- |
| RV3 | the SDMS V3 record |
| RV4 | the SDMS V4 record |
| RV4.DatasetProperties | the record_v4.data.DatasetProperties element |
| RV4.FileSourceInfos | the record_v4.data.DatasetProperties.FileSourceInfos element |
### The SDMS V3 record generation in detail
- `RV3.id`
The ID in SDMS V3 is autogenerated based on the values composing the SDMS V3 URI: `tenant`, `subproject`, `path` and `name`.
```python
hash_obj = hashlib.sha512()
hash_obj.update((RV3.data.path + RV3.data.name).encode('utf-8'))
hashed_value = hash_obj.hexdigest()
cosmos_record["id"] = 'ds-' + RV3.data.tenant + '-' + RV3.data.subproject + '-' + hashed_value
```
- `RV3.data.name`
The dataset name.
```python
if 'Name' in RV4.data:
RV3.data.name = RV4.data.Name
elif len(FileSourceInfos) == 1 and 'Name' in FileSourceInfos[0]
RV3.data.name = FileSourceInfos[0].Name
else:
RV3.data.name = RV4.id
```
- `RV3.data.tenant`
The dataset tenant name matches the data-partition-id in the OSDU model. This specific information cannot be automatically detected in a V4 record but can be easily detected by the syncing process .
```python
RV3.data.tenant = data_partition_id
```
- `RV3.data.subproject`
The dataset resource group name (referred to as subproject in SDMS V3) must exist in SDMS V3 with the `access_policy` property set to `dataset`. Essentially, each partition in SDMS V3 should have a default data group where all SDMS V4 datasets can be collected. This required data group can be automatically created by the syncing process. The name of the data group will default to `syncv4`.
```python
RV3.data.subproject = "syncv4"
```
- `RV3.data.path`
The dataset virtual path represents the logical folder structure in the data group (subproject) where the dataset is stored.
```python
RV3.data.path = RV4.DatasetProperties.FileCollectionPath
```
- `RV3.data.acls`
The Access Control List (ACL) defines the list of users with admin and viewer rights. The only difference is that in the SDMS V3 record, the `owners` list is named `admins`, while the `viewers` list has matching names.
```python
RV3.data.acls.admins = RV4.acls.owners
RV3.data.acls.viewers = RV4.acls.viewers
```
- `RV3.data.ltag`
In SDMS V3, legal tag information is represented by a unique value, whereas in SDMS V4, it is represented as a list. To simplify the record composition, we select the first valid legal tag from the V4 record list. If no valid legal tags are found in the V4 record, we should always set an invalid legal tag in V3. If this is not set, V3 will inherit a valid legal tag from the data group, risking the possibility of a non-accessible record in V4 being addressable in V3.
```python
for tag in RV4.legal.legaltags:
if isValid(tag):
RV3.data.ltag = tag
break
if tag is None:
RV3.data.ltag = RV4.legal.legaltags[0]
```
- `RV3.data.created-by`
The user who created/ingested the dataset.
```python
RV3.data['created-by'] = RV4.createUser
```
- `RV3.data.created_date`
The timestamp when the dataset was created/ingested.
```python
RV3.data.created_date = RV4.createTime
```
- `RV3.data.last_modified_date`
The timestamp when the dataset was last modified.
```python
RV3.data.last_modified_date = RV4.modifyTime
```
- `RV3.data.gcsurl`
The storage ID of the container/bucket where dataset bulk files are stored. This value is automatically generated based on the record ID value.
```python
hash_obj = hashlib.sha256()
hash_obj.update(RV4.id.encode('utf-8'))
RV3.data.gcsurl = hash_obj.hexdigest()[:-1]
```
- `RV3.data.ctag`
The Coherency Tag (ctag) is a hash code associated with the dataset descriptor that changes every time the metadata is updated. This property exists only in SDMS V3, and it is autogenerated.
```python
alphabet = string.ascii_letters + string.digits
RV3.data.ctag = ''.join(secrets.choice(alphabet) for _ in range(16))
```
- `RV3.data.readonly`
The `readonly` property defines the dataset's status regarding readability. If set to `false`, the dataset can be accessed in both read and write modes. If set to `true`, the dataset can only be accessed in read mode. In SDMS V4, a dataset cannot be marked as `readonly`, and for this reason, in the generated V3 record, the value will be defaulted to `false`.
```python
RV3.data.readonly = False
```
- `RV3.data.filemetadata`
The `filemetadata`, also known as the dataset manifest, is an object containing information about how the dataset's bulks are stored in the cloud storage resource. The only supported manifest in SDMS V3 is the `GENERIC`, which requires that all objects composing the dataset be saved in sequential order using the `0` to `N-1` naming convention, where `N` is the number of objects. The fields composing the dataset manifest are:
`nobjects`: the number of objects composing the dataset. this value can be computed by counting the number of objects composing the dataset.
`size`: the dataset total size can be computed by summing the sizes of all objects composing the dataset. Alternatively, if it exists, the `RV4.data.TotalSize` can be used, but computing it will provide a better and clearer result.
`type`: the manifest type, with `GENERIC` the only supported.
`checksum`: the dataset checksum.
`tier_class`: the dataset storage tiering class.
```python
blob_list = getBlobClient(connectionString)
size = 0
tier_class = None
objects_num = 0
error = False
for blob in blob_list:
if blob.name != str(count):
error = True
if tier_class == None:
tier_class = blob.blob_tier
objects_num = objects_num + 1
size = size + blob.size
if not error:
RV3.data.filemetadata.type = 'GENERIC'
RV3.data.filemetadata.nobjects = objects_num
RV3.data.filemetadata.size = size
if 'Checksum' in RV4.DatasetProperties:
RV3.data.filemetadata.checksum = RV4.DatasetProperties.Checksum
RV3.data.filemetadata.tier_class = tier
else:
RV3.data.filemetadata = None
```
- `RV3.data.computed_size`
The `computed_size` is generated by SDMS V3 when the `/size` endpoint is triggered. This endpoint calculates the size of the datasets by summing the sizes of all composing objects. This field has been introduced because the dataset filemetadata object is an optional field created by client applications, such as sdapi or sdutil, and can only be trusted by them.
```python
blob_list = getBlobClient(connectionString)
size = 0
for blob in blob_list:
size = size + blob.size
RV3.data.computed_size = size
```
- `RV3.data.computed_size_date`
This is the timestamp of when the dataset size has been computed by SDMS V3.
```python
RV3.data.computed_size_date = str(datetime.datetime.now())
```
- `RV3.data.seismicmeta_guid`
The `seismicmeta_guid` is the ID of a record linked with the SDMS V3 dataset. This can be associated with the SDMS V4 record so all extra properties can be downloaded by consumer applications.
```python
RV3.data.seismicmeta_guid = RV4.id
```
### The Script to validate the proposed conversion
- The script [sync-script.py](/uploads/2421d4b04fe2a6fdd560f1df321e5d36/sync-script.py) is provided with this ADR (for testing purposes only) to demonstrate and validate the synching flow between SDMS V4 and V3:
- Create a random data file of 16MB and compute the checksum
- Fill an OSDU record and register it in SDMS V4
- Upload the 16MB file as 4 objects of 4MB each using the connection string generated via SDMS V4
- Generate an V3 metadata record and register it in SDMS V3
- Ensure the dataset in SDMS V3 can be located after ingestion
- Download all objects using the connection string generated via SDMS V3
- Compare the initial object with the download one to ensure these match
#### Example of an SDMS V4 ingested record
```json
{
"id": "opendes:dataset--FileCollection.SEGY:7fe06451787641c4953a06a63e44967a",
"kind": "osdu:wks:dataset--FileCollection.SEGY: 1.1.0",
"version": 1694519237996696,
"acl": {
"viewers": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.viewer@opendes.domain.com"
],
"owners": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.admin@opendes.domain.com"
]
},
"legal": {
"legaltags": [
"ltag-seistore-test-01"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"modifyUser": "test-user@domain.com",
"modifyTime": "2023-09-07T11:47:18.625Z",
"createUser": "test-user@domain.com",
"createTime": "2023-09-07T07:17:58.443Z",
"data": {
"Name": "data-sync.segy",
"TotalSize": "16777216",
"Description": "SDMS synching test record",
"DatasetProperties": {
"FileCollectionPath": "/f1/f2/f3/",
"FileSourceInfos": [
{
"FileSource": "data-sync.segy",
"Name": "data-sync.segy",
"FileSize": "16777216",
"Checksum": "8ce2025f9b27e3017ab15f15b261d599",
"ChecksumAlgorithm": "MD5"
}
],
"Checksum": "8ce2025f9b27e3017ab15f15b261d599"
}
}
}
```
#### Example of a generated SDMS V3 metadata
```json
{
"id": "ds-opendes-syncv4-c0699ac77bc64a5772ac7f6f455ce5a251e3686d87d26e91df2ecc73e7bfdf4b0a16ac757c2ec227c1a6814d097a0b6b759a01dc52753754a0a18dfaea53c7d0",
"data": {
"name": "data-sync.segy",
"tenant": "opendes",
"subproject": "syncv4",
"path": "/f1/f2/f3/",
"acls": {
"admins": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.admin@opendes.domain.com"
],
"viewers": [
"data.sdms.opendes.tdata.fe6730f9-bb3d-46a3-9f03-3d529e32360d.viewer@opendes.domain.com"
]
},
"ltag": "ltag-seistore-test-01",
"created-by": "test-user@domain.com",
"created_date": "2023-09-07T07:17:58.443Z",
"last_modified_date": "2023-09-07T11:47:18.625Z",
"gcsurl": "a5993feef91df715c176452fe1a26d04ca70e88d0ccff268e92cd74c76dde61",
"ctag": "9STTAfiKl4iukKbp",
"readonly": "false",
"filemetadata": {
"nobjects": 4,
"size": 16777216,
"type": "GENERIC",
"checksum": "8ce2025f9b27e3017ab15f15b261d599",
"tier_class": "Hot"
},
"computed_size": 16777216,
"computed_size_date": "2023-09-12 13:47:45.877142",
"seismicmeta_guid": "opendes:dataset--FileCollection.SEGY:7fe06451787641c4953a06a63e44967a"
}
}
```
### SDMS V4 to V3 Synching Automation
The preceding section explains the process of creating a metadata descriptor for SDMS V3 using an OSDU record. This metadata descriptor enables access to a dataset ingested in SDMS V4 through SDMS V3.
In order to automate the process, we will deploy a new service called the `sdms-sync-service`, which will be responsible for generating an SDMS V3 record every time a new dataset is registered in SDMS V4. When a dataset is registered in SDMS V4, a message will be pushed into a Redis queue `insert-synch-v4:{record-id}:{partition}:{other-required-params}`. The new service will consume the messages from the Redis queue and initiate the synching process:
- retrieve the OSDU record from storage service
- generates the corresponding SDMS V3 metadata descriptor
- saves the generated metadata in the SDMS V3 journal.
<div align="center">
<br/><img src="/uploads/b2d6eb24b28516feb0908e5ef7232a2e/sdms-sync-service.png"
alt="sdms-sync-service"
style="display: block; margin: 0 auto" /><br/>
</div>
### Details
- If a dataset is patched in SDMS V4, the service should push an `insert` message into the Redis queue:
- If the previous `insert` message is still in the queue (not yet consumed by the sync service), the existing entry will be overwritten in the queue, and the sync service will create the updated one.
- If the previous version was already synced, when the new message is consumed, the updated record will be created, and because the generated key is identical, it will overwrite the existing record in the journal.
- if a dataset is delete in SDMS V4 the service should push a `delete` message in the Redis queue.
- When the delete message is consumed, the sync service will generate only the V3 record key and remove the entry from the journal.
- If the `insert` message was still not consumed from the queue, when the sync service consume it it should check if a `delete` message is also present for the same record. In case this is located in the queue, the sync service will skip the sync process and remove both entry `insert` and `delete` from the Redis queue.
### Limitations
When a dataset is registered in V4 via a client app, the record is created instantaneously, while uploading the bulk data into the storage resource takes longer. If the `insert` message is consumed before the bulk data is uploaded, the file manifest cannot be computed due to missing objects. To address this issue, we can enable a background process in the `sync-service` that loops over the created SDMS V3 records and updates the manifest in cases where it does not exist or when the last modified time in the corresponding SDMS V4 record is greater than the one reported in the V3 entry. This approach should be re-discussed with the community to find an optimal strategy to apply.M22 - Release 0.25Sacha BrantsMark YanSacha Brantshttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/204Add a mechanism to use an external application to get credentials2023-09-08T08:27:25ZMorten OfstadAdd a mechanism to use an external application to get credentialsIn order to get credentials that require a user to log in, it will be useful to run a separate executable. This is e.g. how git works (global config sets credential.helper to point to an executable, can be configured per URL prefix, see ...In order to get credentials that require a user to log in, it will be useful to run a separate executable. This is e.g. how git works (global config sets credential.helper to point to an executable, can be configured per URL prefix, see Git - gitcredentials Documentation (git-scm.com)). Integrating this directly in OpenVDS makes it easy for other applications to take advantage of.
The suggested implementation will add new global keys (valid for all cloud providers) credential_helper and credential_helper_args to the connection string format. If the credential_helper key is present, the executable pointed to will be run with the args from credential_helper_args and the URL as arguments, and the output will be parsed as a connection string and added to the remaining keys after removing the credential_helper and credential_helper_args keys. This allows other arguments like tolerance etc. to be passed on from the original connection string after using the credentials helper.