OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2023-10-03T12:43:44Zhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/rock-and-fluid-sample/rafs-ddms-services/-/issues/98Postman Collection and parquet file sample - observations2023-10-03T12:43:44ZDebasis ChatterjeePostman Collection and parquet file sample - observations@Mykhailo_Buriak and @Kseniya - I thought it is best to create issue here as needed. Would be easier for tracking.
cc @bdawson and @Siarhei_Khaletski
This refers to your recent artifacts here
https://community.opengroup.org/osdu/qa/-/t...@Mykhailo_Buriak and @Kseniya - I thought it is best to create issue here as needed. Would be easier for tracking.
cc @bdawson and @Siarhei_Khaletski
This refers to your recent artifacts here
https://community.opengroup.org/osdu/qa/-/tree/main/Dev/48_CICD_Setup_RAFSDDMSAPI
Thanks for using IDs as per Kentish example. Page 453/1082.
Wellbore KKS1, Coring ID KKS1-Core1, Rock Sample ID - KKS1-Coreing1-Sample-10A, Rock Sample analysis wpc ID=**Test**
The RSA wpc record shows depth of 2423.26, density of 2.643, permeability of 4410 and porosity of 35.7.
Rock Sample Analysis Request-03 (Add data - parquet) - Sample parquet file in this folder does not show matching data.
3 rows. Depths 346m, 10m, 19999m. Grain Density 1.3, 1.3, 1.31.
Rock Sample Analysis Request-04 (Add data - JSON) - payload here is not matching with Rock Sample ID record **Test**
3 rows. Depths 346m,10m, 19999m. Grain Density 1.30, 1.30, 1.31.
JSON payload does match content of sample parquet file, as far as I can tell. So, that is good.
Some general questions/comments -
- Will numerical data in DDMS optimized content match single Rock Sample Analysis record?
- Will numerical data be duplicated from actual metadata (what is in the record of Rock Sample Analysis wpc)?
- Add data request refers to single record of Rock Sample Analysis. Do we not expect many-to-1 from Catalog record into Optimized content?
- Request-04 - "Add data JSON" still has wrong unit of measure for porosity, permeability, grain density and such properties.
- Request-03 - "Add data (parquet)" still has wrong unit of measure for porosity, permeability, grain density and such properties.
- In the current approach, "unit of measure" value is repeated for each row. That is close to PPDM approach, and not close to OSDU Data Definition approach.
Copying to @bdawson and @Siarhei_Khaletski for information.Michael JonesMykhailo BuriakMichael Joneshttps://community.opengroup.org/osdu/platform/pre-shipping/-/issues/535While Testing Augmented search feature in AWS R3 M18 Pre-ship environment - n...2023-06-30T00:44:47ZKamlesh TodaiWhile Testing Augmented search feature in AWS R3 M18 Pre-ship environment - not able to get the expected results.Following these steps to test Augmented search feature
1. Make sure that the schema of kind "osdu:wks:reference-data--IndexPropertyPathConfiguration:1.0.0" is deployed. It should be as it is part of M18 schema. (**Executed sucessfully**...Following these steps to test Augmented search feature
1. Make sure that the schema of kind "osdu:wks:reference-data--IndexPropertyPathConfiguration:1.0.0" is deployed. It should be as it is part of M18 schema. (**Executed sucessfully**)
2. Make sure the feature flag "index-augmenter-enabled" is turned in the tested data partition (**Do not have access to execute this step**)
3. Select a few kinds of data that users want to create extended properties from related objects (**Selected Well, Wellbore, WellLog, WellboreTrajectory,WellboreMarkerSet**)
4. Define the property extension configuration in the data block of the records with kind "osdu:wks:reference-data--IndexPropertyPathConfiguration:1.0.0".
5. Deploy the configuration records to the storage via storage API
<details><summary>Configuration records created</summary>
{
"recordCount": 5,
"recordIds": [
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellLog:1.",
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellboreTrajectory:1.",
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellboreMarkerSet:1.",
"osdu:reference-data--IndexPropertyPathConfiguration:wks:master-data--Well:1.",
"osdu:reference-data--IndexPropertyPathConfiguration:wks:master-data--Wellbore:1."
],
"skippedRecordIds": [],
"recordIdVersions": [
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellLog:1.:1687552840965025",
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellboreTrajectory:1.:1687552840965025",
"osdu:reference-data--IndexPropertyPathConfiguration:work-product-component--WellboreMarkerSet:1.:1687552840965025",
"osdu:reference-data--IndexPropertyPathConfiguration:wks:master-data--Well:1.:1687552840965025",
"osdu:reference-data--IndexPropertyPathConfiguration:wks:master-data--Wellbore:1.:1687552840965025"
]
}
</details>
<details><summary>Retrieved Well configuration record for verification </summary>
{
"data": {
"Name": "Well-IndexPropertyPathConfiguration",
"Description": "The index property list for master-data--Well:1., valid for all master-data--Well kinds for major version 1.",
"Code": "osdu:wks:master-data--Well:1.",
"AttributionAuthority": "OSDU",
"Configurations": [
{
"Name": "CountryNamesKTJun23",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelationshipDirection": "ChildToParent",
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelatedConditionMatches": [
"osdu:reference-data--GeoPoliticalEntityType:Country:"
],
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
],
"UseCase": "As a user I want to find objects by a country name, with the understanding that an object may extend over country boundaries."
},
{
"Name": "WellUWIKTJun23",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"ValueExtraction": {
"RelatedConditionMatches": [
"osdu:reference-data--AliasNameType:UniqueIdentifier:",
"osdu:reference-data--AliasNameType:RegulatoryName:",
"osdu:reference-data--AliasNameType:PreferredName:",
"osdu:reference-data--AliasNameType:CommonName:",
"osdu:reference-data--AliasNameType:ShortName:"
],
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"ValuePath": "data.NameAliases[].AliasName"
}
}
],
"UseCase": "As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am able to specify a prioritized AliasNameType list to look up value in the NameAliases array."
}
]
},
"meta": [],
"modifyUser": "admin-main@testing.com",
"modifyTime": "2023-06-23T20:40:41.335Z",
"id": "osdu:reference-data--IndexPropertyPathConfiguration:wks:master-data--Well:1.",
"version": 1687552840965025,
"kind": "osdu:wks:reference-data--IndexPropertyPathConfiguration:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@osdu.example.com"
],
"owners": [
"data.default.owners@osdu.example.com"
]
},
"legal": {
"legaltags": [
"osdu-AugmIdxExt-Legal-Tag-Test"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "admin-main@testing.com",
"createTime": "2023-06-15T17:35:37.889Z"
}
</details>
6. Re-index all the kinds that have extended properties using the reindex API
<details><summary>re-index --Well:1.0.0, --Well:1.1.0, --Well:1.2.0</summary>
curl --location 'https://osdu.r3m18.preshiptesting.osdu.aws/api/indexer/v2/reindex?force_clean=true' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer eyJraWQiOi...truncated...CG4HUDHg' \
--data '{
"kind": "osdu:wks:master-data--Well:1.0.0"
}'
Response 200 OK
</details>
7. Test search with the extended properties
<details><summary>Search and it's results</summary>
curl --location 'https://osdu.r3m18.preshiptesting.osdu.aws/api/search/v2/query' \
--header 'Authorization: Bearer eyJraWQiOi...Truncated...BAy-bDbtQ' \
--header 'data-partition-id: osdu' \
--header 'Content-Type: application/json' \
--data '{
"kind": "osdu:wks:master-data--Well:1.*",
"query": "_exists_:data.WellUWIKTJun23",
"returnedFields": ["id", "kind", "data.WellUWIKTJun23"]
}'
Response 200 OK
{
"results": [],
"aggregations": [],
"totalCount": 0
}
</details>
Also tried similar steps in GC where we have access to look at whether the feature is available or not. We found that feature is not enabled
and do not have permissions/access to enable the feature.
<details><summary>In GC R3 M18 Pre-ship environment</summary>
curl --location 'https://preship.gcp.gnrg-osdu.projects.epam.com/api/partition/v1/partitions/odesprod' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: odesprod' \
--header 'Authorization: Bearer ya29.a0AWY7Ckmj...truncated...fpwHiQ0167' \
--data ''
Response 200 OK
{
"kubernetes-secret-name": {
"sensitive": false,
"value": "eds-odesprod"
},
"elasticsearch.password": {
"sensitive": true,
"value": "ELASTIC_PASS_ODESPROD"
},
"serviceAccount": {
"sensitive": false,
"value": "datafier@osdu-service-prod.iam.gserviceaccount.com"
},
"dataPartitionId": {
"sensitive": false,
"value": "odesprod"
},
"bucket": {
"sensitive": false,
"value": "osdu-data-prod-odesprod-records"
},
"index-augmenter-enabled": {
"sensitive": false,
"value": "false"
},
...Truncated...
{
"indexer.service.account": {
"sensitive": false,
"value": "workload-indexer-gcp@osdu-service-prod.iam.gserviceaccount.com"
},
"projectId": {
"sensitive": false,
"value": "osdu-data-prod"
}
}
When I try to enable the index-augmenter, I get the response of 403 Forbidden - RBAC: access denied.
</details>M18 - Release 0.21Dzmitry Malkevich (EPAM)Yong ZengDzmitry Malkevich (EPAM)https://community.opengroup.org/osdu/platform/system/schema-service/-/issues/130Schema service whitesource issue2024-01-11T11:53:42ZSudesh TagadpallewarSchema service whitesource issueThere are vulnerabilities exist in schema service. I have fixed vulnerabilities relevant libraries. I have skipped upgrading libraries which needs Java 17+. Right now pipeline is failing at gc-deploy and gc-baremetal-deploy. We need to f...There are vulnerabilities exist in schema service. I have fixed vulnerabilities relevant libraries. I have skipped upgrading libraries which needs Java 17+. Right now pipeline is failing at gc-deploy and gc-baremetal-deploy. We need to fix these vulnerabilities.
MR link - https://community.opengroup.org/osdu/platform/system/schema-service/-/merge_requests/504
![image](/uploads/dfa39bf12a986d682cf04e05da9e4896/image.png)M19 - Release 0.22vikas ranavikas ranahttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/130Search Service does not work with cursors2024-02-22T09:58:57ZRiabokon Stanislav(EPAM)[GCP]Search Service does not work with cursorsThis issue was observed when the GC team was running various requests on Search Service.
for example,
```
curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query_with_cursor' \
--header 'Content-Type: appl...This issue was observed when the GC team was running various requests on Search Service.
for example,
```
curl --location 'https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query_with_cursor' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: osdu' \
--header 'accept: application/json' \
--header 'Authorization: Bearer ey' \
--data '{
"kind": "*:*:*:*",
"query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
"trackTotalCount": true
}'
```
with an answer
```
{
"cursor": null,
"results": [],
"totalCount": 0
}
```
Investigation:
**Search Service** will create a request on ElasticSearch:
`POST /*-*-*-*,-.*/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=90s&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true`
A parameter **scroll = 90s** means we will use Scroll API to use a cursor with **a time-life = 90 seconds**.
However, we will create a cursor for every indice and can get an error for an indice:
`Trying to create too many scroll contexts. Must be less than or equal to: [500]. This limit can be set by changing the [search.max_open_scroll_context] setting.`
After a while, we decided to investigate this issue deeper.
When we check node stats with the next request:
https://gc_elastic_search:9243/_nodes/stats/indices/search
answer was
```
"indices": {
"search": {
"open_contexts": 1,
"query_total": 15271488,
"query_time_in_millis": 9385974,
"query_current": 0,
"fetch_total": 9567767,
"fetch_time_in_millis": 590770,
"fetch_current": 0,
"scroll_total": 4399252,
"scroll_time_in_millis": 7768131243,
"scroll_current": 1,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
}
}
```
As we can see, ElasticSearch has **1 scroll_current**.
Let's run our request again
```
{
"kind": "*:*:*:*",
"query": "data.DatasetProperties.FileSourceInfo.PreloadFilePath: (\"s3://osdu-seismic-test-data*\")",
"trackTotalCount": true
}
```
The answer from node stats was
```
"indices": {
"search": {
"open_contexts": 1193,
"query_total": 15272901,
"query_time_in_millis": 9386238,
"query_current": 0,
"fetch_total": 9567932,
"fetch_time_in_millis": 590779,
"fetch_current": 0,
"scroll_total": 4399329,
"scroll_time_in_millis": 7768231132,
"scroll_current": 1193,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
}
}
```
We will get **"scroll_current": 1193**. Thus, thanks to our request, we will create 1193 cursors with the same ID for every indice.
Solution:
- try to avoid such requests when we want to use search_with_cursors.
- According with an official Elasticsearch documentation, we have to use
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/paginate-search-results.html#search-after instead of Scroll API.
`We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).`
More details: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/scroll-api.html.
In our case, we have to maintain **search_after** parameter in Search Service.
- Run request
```
curl -i -X PUT \
-H "Authorization:Basic ****" \
-H "data-partition-id:osdu" \
-H "Content-Type:application/json" \
-d \
'\{"persistent" : { "search.max_open_scroll_context": 5000 }
,
"transient":
{ "search.max_open_scroll_context": 5000 }
}' \
'https://elastic_search:9243/_cluster/settings'
```
to increase max_open_scroll_context.https://community.opengroup.org/osdu/platform/system/reference/schema-upgrade/-/issues/3Is there a workflow on top of the service calls?2023-07-20T11:33:48ZChad BrockmanIs there a workflow on top of the service calls?The low level service calls make sense here as long as the library supports what's needed for the json conversion.
What is the plan for the "orchestration" of the upgrade?
Roll your own? Another services that leverages these services? ...The low level service calls make sense here as long as the library supports what's needed for the json conversion.
What is the plan for the "orchestration" of the upgrade?
Roll your own? Another services that leverages these services? Airflow?https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/529AWS M18 Preship - eds_ingest unable to fetch data records from Azure Preship2023-06-26T20:27:59ZPriyanka BhongadeAWS M18 Preship - eds_ingest unable to fetch data records from Azure Preshiphttps://5999f1b0-bc27-432c-b834-b3e0814512d0.c8.us-east-2.airflow.amazonaws.com/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-06-19T08%3A21%3A44%2B00%3A00&map_index=-1
eds_ingest - AWS M18 Preship is target and Azure Pe...https://5999f1b0-bc27-432c-b834-b3e0814512d0.c8.us-east-2.airflow.amazonaws.com/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-06-19T08%3A21%3A44%2B00%3A00&map_index=-1
eds_ingest - AWS M18 Preship is target and Azure Peship is Provider . AWS makes Search API call to Azure for Well data.
Search query is as below:
[2023-06-19, 08:21:55 UTC] {{src_dags_fetch_and_ingest.py:334}} INFO - QUERY BODY:{**'kind': 'osdu:wks:master-data--Well:1.0.0', 'query': '* AND ((createTime: [2020-01-01T00:00:00 TO 2023-06-19T08:21:54]) OR (modifyTime: [2020-01-01T00:00:00 TO 2023-06-19T08:21:54]))', 'sort': {'field': ['createTime'], 'order': ['ASC']}, 'limit': 100}**
Issue - it Doesnt fetch any results as shown below, where as same query fetch 1300 approx data records on postman
[2023-06-19, 08:21:55 UTC] {{src_dags_fetch_and_ingest.py:158}} ****ERROR - {'status': 'error', 'message': KeyError('results')}****https://community.opengroup.org/osdu/platform/security-and-compliance/legal/-/issues/43Legal post migration TODOs (Gson, module access)2023-07-05T09:13:21ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comLegal post migration TODOs (Gson, module access)- Unit test refactoring is required, to remove --add-opens params from build run args.
~~~
<argLine>--add-opens java.base/java.lang=ALL-UNNAMED</argLine>
~~~
- Gson usage should be revised and refactored, currently, issue is mitigated wi...- Unit test refactoring is required, to remove --add-opens params from build run args.
~~~
<argLine>--add-opens java.base/java.lang=ALL-UNNAMED</argLine>
~~~
- Gson usage should be revised and refactored, currently, issue is mitigated with run args:
~~~
--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED
~~~
But without it, Azure and GC modules met such issues:
~~~
legal.app: Error when publishing legaltag status change to pubsub
com.google.gson.JsonIOException: Failed making field 'java.lang.Enum#name' accessible; either change its visibility or write a custom TypeAdapter for its declaring type
at com.google.gson.internal.reflect.ReflectionHelper.makeAccessible(ReflectionHelper.java:23)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory.getBoundFields(ReflectiveTypeAdapterFactory.java:203)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory.create(ReflectiveTypeAdapterFactory.java:112)
at com.google.gson.Gson.getAdapter(Gson.java:531)
~~~https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/rock-and-fluid-sample/rafs-ddms-services/-/issues/69Expired JWT2023-06-29T14:11:54ZKseniya BarkouskayaExpired JWT**Preconditions:**
- Obtain an access token.
- Download the Postman Collection from the following location: [Postman Collection](https://community.opengroup.org/osdu/qa).
**Steps:**
1. Wait until the access token has expired (typicall...**Preconditions:**
- Obtain an access token.
- Download the Postman Collection from the following location: [Postman Collection](https://community.opengroup.org/osdu/qa).
**Steps:**
1. Wait until the access token has expired (typically after 1 hour).
2. Make any DDMS request using the expired access token.
**Expected result:**
1. HTTP status code: 401 (Unauthorized).
2. Response text: "Jwt is expired."https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/issues/68Core-common post migration TODOs (Powermock, module access)2023-06-28T14:57:27ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comCore-common post migration TODOs (Powermock, module access)- Unit test refactoring is required, to remove --add-opens params from build run args.
~~~
<argLine>--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED</argLine>
~~~
Tests failed without args:
~~~
Fail...- Unit test refactoring is required, to remove --add-opens params from build run args.
~~~
<argLine>--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED</argLine>
~~~
Tests failed without args:
~~~
Failed tests:
shouldThrowExceptionWhenPropertyNotPresentInEnv(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldThrowExceptionWhenPropertyInMapNull(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldThrowExceptionWhenPropertyValueInMapNull(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldThrowExceptionWhenPropertyNotInMap(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
Tests in error:
shouldNotThrowExceptionWhenPropertyValueInMapIsEmpty(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldNotThrowExceptionWhenEnvReturnEmptyVal(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldGetTypedProperties(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
getPropertyValueFromEnv(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
shouldThrowExceptionWhenEnvReturnNull(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
getPropertyValueFromMapValue(org.opengroup.osdu.core.common.partition.PartitionPropertyResolverTest)
~~~
- Powermock dependency removal is highly recommended, Powermock not getting updates to work with Java 17, and has conflicts with Mockito:
~~~
<dependency>
<groupId>org.powermock</groupId>
<artifactId>powermock-core</artifactId>
<version>2.0.9</version>
<scope>test</scope>
</dependency>
~~~https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/issues/67inefficient/non-performant crs-conversions causing reliability/performance is...2023-06-21T07:54:16ZYurii Kondakovinefficient/non-performant crs-conversions causing reliability/performance issues on ingestion workflowWe are seeing reliability and performance issue because of inefficient/non-performant crs-conversions on ingestion workflow. If crs conversions takes long time (on Storage /batch API), then it slows down entire ingestion workflow.
There...We are seeing reliability and performance issue because of inefficient/non-performant crs-conversions on ingestion workflow. If crs conversions takes long time (on Storage /batch API), then it slows down entire ingestion workflow.
There is a need for setting a timeout for crs-conversion requests that run for more than a certain time. For the requests to crs-conversion-service currently java.net.HttpURLConnection class is used, which is only has connectionTimeout and readTimeout properties, that don't help us to set the timeout.
It is suggested to use apache CloseableHttpClient httpClient that has socketTimeout property.
core-common MR - https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/merge_requests/213
storage MR - https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/712M19 - Release 0.22Yurii KondakovYurii Kondakovhttps://community.opengroup.org/osdu/platform/deployment-and-operations/helm-charts-azure/-/issues/24Airflow2 old apiResource not compatible with AKS 1.252023-06-19T22:27:58ZArturo Hernandez [EPAM]Airflow2 old apiResource not compatible with AKS 1.25Helm chart for airflow2 installation contains
[apiVersion: policy/v1beta1](https://github.com/airflow-helm/charts/blob/airflow-8.5.2/charts/airflow/templates/webserver/webserver-pdb.yaml#LL1C1-L1C1)
```yaml
apiVersion: policy/v1beta1 #...Helm chart for airflow2 installation contains
[apiVersion: policy/v1beta1](https://github.com/airflow-helm/charts/blob/airflow-8.5.2/charts/airflow/templates/webserver/webserver-pdb.yaml#LL1C1-L1C1)
```yaml
apiVersion: policy/v1beta1 # << Not existing anymore in AKS 1.25
```
Which it is not compatible with AKS 1.25.
To overcome this issue we can add following specs:
```yaml
airflow:
web:
podDisruptionBudget:
enabled: false
scheduler:
podDisruptionBudget:
enabled: false
workers:
podDisruptionBudget:
enabled: false
```
We can either add this in the documentation to overcome this issue or upgrade community airflow2 helm chart to a recent one [Recommended airflow-8.7.1](https://github.com/airflow-helm/charts/tree/airflow-8.7.1).
Or we can just change this apiVersion inside the tar file (easier option).
cc. @lucynliuArturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/system/dataset/-/issues/55FileSourceInfos array not getting populated or not returned by search/get record2023-07-06T13:31:07ZZachary KeirnFileSourceInfos array not getting populated or not returned by search/get recordPer the schema AbstractFileCollection [AbstractFileCollection](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractFileCollection.1.0.0.json), there is an array property FileSourceInfos tha...Per the schema AbstractFileCollection [AbstractFileCollection](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/abstract/AbstractFileCollection.1.0.0.json), there is an array property FileSourceInfos that should contain information about each file in a File Collection. When I create a multiple file File collection and upload each file using signed urls as in the core services Postman collection example, then when I try to either search for or retrieve the dataset collection, the only properties from DatasetProperties that are returned are the IndexFilePath and the FileCollectionPath, NOT the FileSourceInfos array. I believe that this property should be getting populated. Using core services AWS M18 pre-ship collection here: [AWS M18 Core Services Postman](https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M18/AWS-M18/Core%20Services/AWS_OSDUR3M18_CoreServices_Collection.postman_collection.json)https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/open-vds/-/issues/191Getting histogram/statistics from OpenVDS data2023-06-22T14:11:50ZQiang FuGetting histogram/statistics from OpenVDS dataDoes OpenVDS data have embedded metadata for histogram/Statistics?
Go through all bricks to calculate the histogram or statistics could be quite expensive.Does OpenVDS data have embedded metadata for histogram/Statistics?
Go through all bricks to calculate the histogram or statistics could be quite expensive.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/wellbore/wellbore-domain-services/-/issues/73ADR: Worker Service for Wellbore Bulk Data Access2024-01-10T20:12:24ZKin Jin NgADR: Worker Service for Wellbore Bulk Data Access## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
Currently, as of M16, Wellbore DDMS is experiencing performance challenges involving WellLogs operations with large bulk data (>...## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
Currently, as of M16, Wellbore DDMS is experiencing performance challenges involving WellLogs operations with large bulk data (>1 Gb), especially on data reading.
It was also observed that Wellbore DDMS requires a significant amount of memory in comparison to the amount of data manipulated to serve incoming requests.
See issues #21 and #27.
Wellbore DDMS is composed of a general main service, which is responsible for handling both client facing API requests, and data access operations to underlying bulk data store.
In turn, the bulk data management implementation in WDDMS is highly based on [Dask](https://www.dask.org/).
For instance, for a large WellLog dataset stored in Wellbore DDMS, the associated data will not be be located in a individual parquet file, but rather distributed in several distinct parquet files.
When a request to retrieve the bulk data associated to a specific subset of WellLog curves, including or not the optional reference range,
is received, Dask is used to process all required parquet files, across which the queried data is stored,
and extract the cropped data corresponding to the selected curves and range from the WellLog dataset.
All operations in the described workflow are executed end to end in the same container for a given request.
Though the main service approach and Dask capabilities provide a simple and straighforward deployment,
it was identified, from previous analysis, that such pairing poses considerable limitations on
Wellbore DDMS performance and scalability capacity.
## Trade-off Analysis
Standard Python framework already offers a good support for I/O bound operations (see [asyncio](https://docs.python.org/3/library/asyncio.html)),
however, when it becomes more complex to deal with CPU bound operations and data transformation operations, Dask brings a first answer to that.
For instance, when reading and writing large WellLog datasets, Dask provides a concise and straighforward implementation to reconciliate data from multiple parquet files.
Nevertheless, if Dask appears to be a good solution for heavy computation, in most WDDMS' supported scenarios of data queries/filters,
Wellbore DDMS is primarily constraint by I/O operations rather than by data transformation operations.
Additionally, Dask showed not to be efficient when handling several queries involving smaller amounts of data,
as its minimum required memory footprint does not scale down based on the smaller volumes of data.
Dask cluster is implemented as a process based local cluster, which also brings several issues:
- Dask workers are internal to the pods and therefore cannot be shared with other WDDMS service instances.
- The scaling/resources request are indirectly done through WDDMS, not the Dask workers.
- Dask workers are actually process forks of WDDMS which leads to unnecessary memory usage even at startup or when idle.
Finally, we spotted several memory leaks within Dask and there are [several memory managment related issues open in Dask's GitHub](
https://github.com/dask/distributed/issues?q=is%3Aissue+is%3Aopen+label%3Amemory+).
## Decision
Dask remains a great tools but it does not fit the needs of WDDMS. Therefore Dask will be removed and
replaced by a new dedicated service responsible for bulk data access only called _wddms bulk data worker service_.
_wddms bulk data worker service_ will be specialized in bulk I/O and bulk data manipulation (transformation, filtering), while WDDMS main service will keep all domain knowledge/responsibility such as meta data manipulation or consistency rules for instance, but
will delegate bulk data operations to the _wddms bulk worker service_.
_wddms bulk worker service_ will not use Dask at all. This means the [current bulk data acces layer](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/wellbore/wellbore-domain-services/-/tree/master/app/bulk_persistence)
in WDDMS will not be moved as-is into the new dedicated service but reworked and tailored to WDDMS specific needs.
The image below illustrates side by side how scaling and workload distribution occur in the current and the target designs.
In the current implementation, an incoming request to retrieve a large amount of data will be limited to the Dask workers resources
of a single WDDMS pod though Dask workers from other WDDMS instances might be available.
In the target design, unlike the current architecture, all processing capacity of the _wddms bulk workers_ instances is available to be used by any WDDMS instances. That arrangement unlocks a better scaling capability as it is done directly on bulk data workers upon needs.
![scaling_view_worker_next](/uploads/921e4f3f506570bafabf38a917dbc3c7/scaling_view_worker_next.jpg){: width="60%"}
### Security Implications
In the current design, the authorization (ACL/policy) checks and the bulk data access operations in WDDMS are performed in the same service instance. Bulk data will only be served to valid users entitled to access the associated work product component record.
The changes proposed in this ADR separate the data access control layer, located in the main WDDMS service, from the bulk data access itself, located in the new _wddm bulk worker service_. See below, the changes in the communication patterns in the current vs target design diagrams.
Allowing users or other services to directly access _wddms bulk woker service_ endpoint would permit bypassing the data access control checks in the main WDDMS service.
Therefore, with the new topology, additional deployment configuration settings will be required to preserve a compliant and secure data access control in WDDMS,
- _wddms bulk woker service_ must not be accessible from the external network
- _wddms bulk woker service_ will only accept requests from WDDMS main service instances
#### Current
![threat_model_current](/uploads/8ef5bf06976e23ad45d2a243064c3e8c/threat_model_current.jpg){: width="60%"}
#### Target
![threat_model_target](/uploads/de4da75c833d44e8503eba6647d0ec98/threat_model_target.jpg){: width="60%"}Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/45Data API - File size transferred - provide option for File/Dataset service2023-06-16T20:43:49ZDebasis ChatterjeeData API - File size transferred - provide option for File/Dataset serviceCurrent implementation gives timing for one specific ingestion, and also for Seismic DDMS.
Provide similar option for file transferred through normal File/Dataset service
It would also be helpful if we can obtain stats for given perio...Current implementation gives timing for one specific ingestion, and also for Seismic DDMS.
Provide similar option for file transferred through normal File/Dataset service
It would also be helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion added how much data.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/44Data API - File size transferred - provide alternative to gather stats over c...2023-06-16T21:20:50ZDebasis ChatterjeeData API - File size transferred - provide alternative to gather stats over certain time periodCurrent implementation gives size of one or more datasets from Seismic Store (SD_STORE).
It would be more helpful if we can obtain stats for given period, for all data uploaded to SD_STORE during that time.
That would help us to underst...Current implementation gives size of one or more datasets from Seismic Store (SD_STORE).
It would be more helpful if we can obtain stats for given period, for all data uploaded to SD_STORE during that time.
That would help us to understand which ingestion added how much data over time.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/43Data API - Time to ingest - provide alternative to obtain stats showing multi...2023-06-16T21:16:26ZDebasis ChatterjeeData API - Time to ingest - provide alternative to obtain stats showing multiple ingestions during a periodCurrent implementation gives timing for one specific ingestion.
It would be more helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion is taking ...Current implementation gives timing for one specific ingestion.
It would be more helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion is taking how much time.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/42Data API - Search speed - provide alternative to gather statistics for certai...2023-06-16T20:35:02ZDebasis ChatterjeeData API - Search speed - provide alternative to gather statistics for certain periodCurrent implementation gives timing for one specific search.
It would be more helpful if we can obtain stats for given period, for all searches performed during that time.
That would help us to detect cases where some user's searches ta...Current implementation gives timing for one specific search.
It would be more helpful if we can obtain stats for given period, for all searches performed during that time.
That would help us to detect cases where some user's searches take long.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/30ADR: Wellbore DDMS - Ingestion of bulk data into OSDU instance.2023-10-17T09:41:44ZNisha ThakranADR: Wellbore DDMS - Ingestion of bulk data into OSDU instance.**Introduction:**
The purpose of this ADR is to address the process of importing curve data into Operator’s OSDU instance from the Provider end using the Wellbore DDMS API. The Wellbore DDMS API provides programmatic access to wellbore-...**Introduction:**
The purpose of this ADR is to address the process of importing curve data into Operator’s OSDU instance from the Provider end using the Wellbore DDMS API. The Wellbore DDMS API provides programmatic access to wellbore-related data, including curve data, allowing for efficient integration and retrieval.
**Objective:**
We aim to establish a direct import mechanism that enables the seamless ingestion of curve data from the provider's end into the Operator's OSDU ecosystem. By leveraging the Wellbore DDMS set of APIs, this approach will streamline the data ingestion process and enhance the capabilities of OSDU by directly incorporating valuable curve data from the provider's end.
**Status**
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
**Scope**
The scope of this ADR includes the following scenarios:
- Data Retrieval: Retrieval of curve data from the provider's Wellbore DDMS using the available set of Wellbore DDMS APIs.
- Ingestion Mechanism: Development and implementation of an ingestion mechanism to process and store the imported curve data.
**Given /assumptions:**
- Operator and Provider are supporting the wellbore DDMS functionality.
- Well log Meta data is already present at the operator end.
- This is more advanced requirement.
**Required Changes:**
- eds_wellbore_ddms: New API end Points will be introduced to handle the working of the DDMS wellbore.
- Below parameters will be added in the CSRE:
- DDMS URL: variable to hold the provider’s DDMS URL.
**Implementation:**
a. Utilize CSRE (ConnectedSource.Generic) to extract the provider details and establish a connection with the provider.
b. Initiate the wellbore DDMS URL of the provider (as provided in CSRE) with the given well log ID, retrieve the well log data and curve data at the operator's end.
c. Employ the wellbore DDMS API to ingest the curve data associated with the well log ID.
**Sequence Diagram:**
![image](/uploads/e66fe3c881cec87d004642f6897c89f6/image.png)
**Functional Requirements:**
- Data Retrieval:
- The set of Wellbore DDMS API should be utilized to retrieve curve data from the provider's end.
- It should support querying capabilities to fetch specific curve data.
- Data Ingestion:
- The import mechanism should provide a reliable and efficient process to ingest the curve data into the Operator’s OSDU ecosystem.
- It should handle the creation or update of curve data records within the OSDU data repositories, associating them with the appropriate wellbore entities.
**Non-functional Requirements**
- Performance:
- The system should be capable of handling large volumes of wellbore data efficiently, providing fast response times for data retrieval and analysis.
- It should be able to handle concurrent user interactions and maintain performance under peak load conditions.
- Scalability:
- The system should be scalable to accommodate increasing amounts of wellbore data and growing user bases.
- It should be able to handle additional data sources and support a high number of concurrent users without significant degradation in performance.https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/29ADR : EDS DMS - To Reverse the Naturalization2023-08-09T06:56:45ZNisha ThakranADR : EDS DMS - To Reverse the Naturalization**Introduction:**
The purpose of this ADR is to address the use case of Reversing Dataset Naturalization for Cleaning Up Operator's Cloud Storage.
**Objective:**
The objective of this ADR is to outline the approach and decision made t...**Introduction:**
The purpose of this ADR is to address the use case of Reversing Dataset Naturalization for Cleaning Up Operator's Cloud Storage.
**Objective:**
The objective of this ADR is to outline the approach and decision made to reverse the dataset naturalization (external to internal) process for the WPC dataset IDs to clean up the storage space on the operator's cloud storage. By reverting the dataset IDs to their original state, the aim is to optimize storage utilization, ensure efficient data management, and maintain consistency with the external dataset IDs.
**Status**
- [x] Proposed
- [x] Trialing
- [x] Under review
- [ ] Approved
- [ ] Retired
**Scope**
The scope of this ADR includes the following scenarios:
- Fetch and Verify the Dataset id:
- Retrieve the internal dataset IDs that needs to be converted back to external one from the WPC inputs also validate if Operation given as input is valid by verifying if the dataset type is internal and DatasetExternal is True.
- Fetch the External dataset id:
- fetch the externaldatasetid (original external dataset id) from the extension property of the internal Dataset schema.
- Reversion Process and Cloud Storage Update
- Utilize the mapping obtained in step 2, to reverse the internal dataset IDs back to
their original external dataset IDs for the connectedsource.generic dataset.
- Update the associated wellLog with the reversed external dataset IDs.
**Given /assumptions:**
o Well log MetaData already present at the operator end.
**Required Changes:**
- eds_dataset: Reverese Naturalization will be added in the same eds_dataset dag already there for naturalization.
- The DAG will be executed on-demand without the involvement of a scheduler.
- An additional boolean parameters should be introduced in the internal dataset schemas(File.Generic) to indicate whether the dataset id is internal or naturalized, to keep a track of the naturalized dataset ids.we can achieve this by using the ExtensionProperties of the dataset schemas.
**DatasetExternal:True**
- During the naturalization process, it is important to maintain a similar dataset ID while allowing for a conversion of the data type i.e from external to internal, it will be easier for reverse mapping.
Example: opendes:dataset--ConnectedSource.Generic:test123 will be converted to:
opendes:dataset--File.Generic:test123
we can also have additional parameters within the
ExternalProperties like pointer to the external dataset id
(connectedSource.Generic) and the original the version.
eg: external_dataset_id:opendes:dataset--ConnectedSource.Generic:test123
external_dataset_id_version: 1614105463059152
**Inputs:**
The inputs for the reverse naturalization process should be an array of WPC IDs and the operation that needs to be performed. Each WPC ID represents a specific work-product-component. Here is an example of the input structure:
{"ids": ["osdu:work-product-component--WellLog:testawsWPC"],"operation":"reverse"}
**Implementation:**
- Initiate the Process of reverser naturalization on the list of dataset ids on the basis of the operation given if "reverse".
- Verify whether the dataset ID corresponds to an internal data type (File.Generic) for the provided inputs in the DAG.
- Convert the internal dataset ID to external dataset id (ConnectedSource.Generic), which is already present at the operator's end by fetching it from the extension property of the iternal dataset schema.
- Re-ingest the schema ID (WellLog ID) along with the external dataset id (ConnectedSource.Generic) to ensure its updated presence.
- Remove file from the blob storage.
**Sequence Diagram**
![image](/uploads/74a8f6754d214ed3955d98cd5a99ed87/image.png)
**Functional Requirements:**
- Dataset ID Mapping:
- Implement a mechanism to map the internal dataset IDs to their corresponding external dataset
IDs.
- Type Conversion:
- A universal solution should be developed to facilitate seamless conversion of internal dataset
IDs to their corresponding external IDs, regardless of the data type.
- Consistent ID Preservation:
- Same id should be used throughout the entire process only type casting is done.
- Data Validation and Integrity:
- Implement validation checks to ensure the correctness and integrity of the dataset IDs during the
naturalization process.
**Non-functional Requirements**
- Performance and Scalability:
-Ensure that the solution can handle a large volume of data and can perform conversions efficiently
within acceptable time limits.
- Design the solution to be scalable, allowing it to handle increasing data loads and accommodate
future growth.
- Reliability and Error Handling:
- Implement robust error handling mechanisms to gracefully handle exceptions and errors during the
conversion process.
- Ensure the solution has built-in resilience to recover from failures and minimize disruptions to the overall system.