OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2022-01-06T14:30:46Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/51A proposal to solve "Version" Mapping uncompatibility between Energistics Ab...2022-01-06T14:30:46Zjean-francois RAINAUDA proposal to solve "Version" Mapping uncompatibility between Energistics Abstract Objects and OSDU Id version**DIFFICULTY ENCOUNTERED**
This difficulty was emphazise when we tryed to operate the WITSML parser in the Ingestion Workflow. We constated an inconsistency during integrity checking.
The "Version" is processed differently into OSDU Da...**DIFFICULTY ENCOUNTERED**
This difficulty was emphazise when we tryed to operate the WITSML parser in the Ingestion Workflow. We constated an inconsistency during integrity checking.
The "Version" is processed differently into OSDU Data definition of Id.version and into the Energistics standard on the Uuid.Version and this drives to an uncompatibility between the two ways of managing these "versions".
It looks like In OSDU the version number is an integer which is the "translation" of the "lastUpdate Date" of the Meta data creation of an Object into a number of seconds from 01/01/1900.
In Energistics Common //Eml23, the version is an attribute of AbstractObject. His name is objectVersion: String64. As you can see this is a String which uncompatible with the OSDU Version number. This Attribute is non mandatory.
In this case we cannot execute an easay mapping between these two attributes.
**PROPOSED SOLUTION**
The proposal is to find a way to map another attribute of the Energistics Common // eml23 with the OSDU version number. this attribute : "last Update: Time stamp" is a date , not mandatory in AbstractObject.Citation which could be easilly translated into an OSDU version Number.
If this attribute exists in the "Energistics Entity" when the manifest is created, the the id version number can be calculated from it.
If this attribute does not exist in the "Energistics Entity" it could be generated when the manifest is created and added into the original Energistics XML file before storing it into the persistent store.
By this way we will have a totally consistent management of the id version number between the Energistics Entities and their meta data into the OSDU platform. This will help to solve the inconsistency captured during integrity checking .
If we follow this method we will be able to be totally consistent between the id.version and the uuid."numerical translation of Last update date".https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/137Osdu_Ingest - Provide additional integrity check to catch inconsistencies in ...2022-01-03T01:42:11ZDebasis ChatterjeeOsdu_Ingest - Provide additional integrity check to catch inconsistencies in denormalized data (Ex: Master Entity "Play")When looking at an example provided by Development team (CSV Ingestion), I have this question regarding Master Data (Play) definition.
https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/E-R/mast...When looking at an example provided by Development team (CSV Ingestion), I have this question regarding Master Data (Play) definition.
https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/E-R/master-data/Play.1.0.0.md
data.GeoContexts[].BasinID -> Basin
data.GeoContexts[].GeoTypeID -> BasinType
Isn’t the second field unnecessary (as one can find out that information from Master record Basin itself)?
"BasinTypeID": "namespace:reference-data--BasinType:ArcWrenchOceanContinent:"
In addition, we open up possibility of conflicting information by offering two separate fields in “Play” Master record.
So, suitable integrity check is required.
-----------------------------------
See notes from @gehrmann -
Hi Debasis,
The schema is de-normalised to support queries by BasinType/GeoPoliticalEntityType/...
Yes, every de-normalisation had the risk of introducing contradictions. This was considered as a trade-off - and considered worthwhile in the interest of easier query handling.
Finally, it is possible to organise the master-data as parent-child structures with self-references. This is easiest understood with GeoPoliticalEntity hierarchy: country, state, county,...
Best regards,
Thomas
________________________________________
Additional notes from Thomas
Hi Debasis,
whether or not the extra validation during ingestion is sufficient - I am not so sure. Basin, Play, Prospect, GeoPoliticalEntity are all master-data and therefore subject to continuous improvement. I would think a generic set of data quality rules, which can be re-evaluated after any change, might be a better choice.
The schema, by the way, does mark derived properties (=de-normalised properties) - please check the schema definitions with the dedicated extension tag x-osdu-is-derived:
Example for AbstractGeoBasinContext:
```
"x-osdu-is-derived": {
"RelationshipPropertyName": "BasinID",
"TargetPropertyName": "BasinTypeID"
}
```
In other words: the property GeoTypeID is derived via the sibling property BasinID linking to the target object's property BasinTypeID.
This decoration has been done in other places as well. It should be possible to create a generic implementation of a quality rule covering all of the derived/de-normalised values.
Best regards,
Thomashttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/136ADR: Workflow Versioning and Update Workflow API2024-01-17T15:31:22ZVineeth Guna [Microsoft]ADR: Workflow Versioning and Update Workflow API# Workflow Versioning
Workflow versioning is a feature to enable seamless running of a newer version of an existing workflow via ingestion workflow. Below are the design challenges/questions around workflow versioning we will discuss goi...# Workflow Versioning
Workflow versioning is a feature to enable seamless running of a newer version of an existing workflow via ingestion workflow. Below are the design challenges/questions around workflow versioning we will discuss going forward
## How to create new version of airflow OSDU DAG’s?
As airflow does not have a way to distinguish between two different versions of a same DAG, we will build this functionality around airflow capabilities
Airflow distinguishes different DAGs based on the DAG name; hence we can create multiple versions of a single DAG by adding a version as a suffix to the DAG name. For example
|Workflow Name|Workflow Version|DAG Name|
|-------------|----------------|--------|
|CSV Parser| 1.0.0 |csv-parser-1.0.0|
|CSV Parser| 2.0.0 |csv-parser-2.0.0|
|CSV Parser| 1.3.1 |csv-parser-1.3.1|
Workflow Version can be one or more of the following
- Git SHA
- Release Version
We can leverage the pipelines which build the final DAG/Packaged DAG to suffix this version to the airflow DAG name before generating the final artifact which can be consumed by airflow to get the new version of an existing DAG up and running
## How ingestion workflow understands about different versions of an existing workflow/DAG?
A workflow metadata in ingestion consists of the following properties
- Workflow ID
- Workflow Name
- Version
- Registration Instructions
- DAG Name (In Airflow)
We can use the combination of version and DAG name to identify different versions of a workflow, For example
|Workflow Name| Workflow Version| DAG Name| Explanation|
|-------------|-----------------|---------------|------------------|
|csv-parser| 1 |csv-parser-1.0.0| This corresponds to a workflow with name “csv-parser” with version “1” which when triggered will use “csv-parser-1.0.0” as the DAG to create a DAG run|
|csv-parser| 2 |csv-parser-1.2.0| This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-1.2.0” as the DAG to create a DAG run, note that this is a minor version change |
|csv-parser| 3 |csv-parser-2.0.0| This corresponds to a workflow with name “csv-parser” with version “2” which when triggered will use “csv-parser-2.0.0” as the DAG to create a DAG run, note that this is a major version change|
## Can we trigger different versions of a workflow?
There will always be only one active version of workflow which can be triggered, to answer the question we cannot trigger different versions of same workflow, we can only trigger the active version of the workflow.
Existing trigger workflow API does not support triggering different versions of a workflow
To explain the above, follow the below example
|Workflow Name| Workflow Version| DAG Name| ACTIVE?|
|-------------|-----------------|---------------|--------------|
|Foo| 1| Foo-1.0.0| Yes|
|Foo| 2| Foo-2.0.0| No|
|Bar| 2| Bar-2.0.0| Yes|
|Bar| 1| Bar-1.0.0| No|
In this case we can trigger workflow for the following
- When “foo” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “1”, so it triggers “foo-1.0.0” DAG on airflow
- When “bar” workflow is triggered, ingestion workflow triggers the DAG associated with the active version i.e “2”, so it triggers “bar-2.0.0” DAG on airflow
**There can always be only one active version for a workflow**
## How to add a new version of a workflow?
We can use the update workflow API to add a new version of a workflow, the details of the API will be discussed below
## How to mark a version of workflow as active?
We can use the update workflow API to mark a version of workflow as active, the details of the API will be discussed below
## How to get all versions of workflow?
Another API is introduced to get all the versions of a workflow. This API should return all workflow metadata for all versions present in the system
Refer to get versions API in this specification here - [API Specification](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## How to mark a version of workflow as active?
By default, once you add a new version of workflow using update workflow API, it becomes active, to make an older version of workflow active, use mark workflow version active API
Steps to activate older version of workflow
1. Call get all versions API to fetch the existing versions of workflow
2. Determine the version of workflow which needs to be activated
3. Call mark workflow version active API by passing the version and workflow name
We can use the mark workflow version active API to make an older version active, refer to the API in this [specification]( https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## Will updates to a workflow effect the existing in progress workflow runs?
All the existing workflow runs will not get effected because of this change, they will run and reach to a completion state
Any new workflow runs triggered after this change will trigger the DAG associated with the active version as discussed above
## Any changes to the existing API specifications?
|API| Any Changes?| API Specification Changes| Behavioral Changes|
|---|---------------|----------------------------|------------------------|
|Register workflow| No| N/A| N/A|
|Get all workflows| Yes| N/A| It should return active version of workflow|
|Delete workflow| Yes| N/A| It should delete all versions of a workflow|
|Get workflow by name| Yes| N/A| It should return active version of workflow|
|Trigger workflow| Yes| N/A| It should only trigger the active version of a workflow|
|Get all workflow runs| Yes| N/A| It should return all workflow runs across all versions of a workflow|
|Get specific workflow run| Yes| N/A| It should get the status of the workflow run based on the DAG associated to the version|
|Update workflow run| No| N/A| N/A|
|Info| No| N/A| N/A|
## How does this change effect existing workflows in the system?
All existing workflows only have one version; hence we treat existing workflows having only single version and use it to trigger the respective DAG’s
If any new metadata is missing as part of the workflow, the workflows should be updated with this metadata if it has some benefits, else we can keep it as is
## Any limitations on the number of versions supported per workflow?
For now, there are no limitations set, but we can revisit this part if we see any issues
## Can we disable DAGs in airflow for inactive versions?
We cannot disable DAGs in airflow as it will result in stopping all the in progress DAG runs, which is not acceptable
If we can build a solution around checking whether all the DAG runs are completed for an inactive DAG asynchronously, we can disable the DAG, this is again only needed if it helps improve airflow performance
## How does workflow versioning apply for system workflows?
It is similar to normal workflows, the concept of versioning for normal workflows applies to system workflows in similar way, as system workflows applies to all data partitions, any change in version of system workflow will affect all data partitions
# Workflow Update API
Check the update API in this specification - [API Specification](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/update_api_spec/docs/api/openapi.workflow.yaml)
## Update API supports the following
- To add a new version of workflow and activate it
```bash
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>' \
--data-raw '{
"registrationInstructions": {
"dagName": "csv-parser-2.0.0",
"dagContent": ""
}
}'
```
- To activate an older version (version 1) of workflow
```bash
curl --location --request PUT 'https://<osdu_endpoint>/v1/workflow/csv-parser/version/1/active' \
--header 'Content-Type: application/json' \
--header 'Authorization: <API Key>'
```
## Update API Limitations
- Cannot update dagName for an already existing version of workflow
- Cannot update description of workflow
- Cannot disable any version of workflow
# Sequence Diagrams for API's after introducing this feature
## Get workflow by name
![Get_Workflow_By_Name](/uploads/8b39b7ceaab1169c03da21309687c800/Get_Workflow_By_Name.png)
## Get all workflows
![Get_All_Workflows](/uploads/626124ec56f069e39ad6c342dbaf4426/Get_All_Workflows.png)
## Trigger workflow
![Trigger_workflow](/uploads/455ac50e420c910f1df519d1a0ef292f/Trigger_workflow.png)
## Get workflow run
![Get_workflow_run](/uploads/20b719d081b6ab83f821a0e2719923ea/Get_workflow_run.png)
## Delete workflow
![Delete_Workflow](/uploads/c7bcc09428fa07bc1af456cda63b0636/Delete_Workflow.png)Vineeth Guna [Microsoft]Vineeth Guna [Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/68CSV Parser does not check integrity for referenced data2022-05-24T11:21:15ZDebasis ChatterjeeCSV Parser does not check integrity for referenced dataI made a test case with this source file.
[Sample_data-ContractorType-DC-IBM-test-error.csv](/uploads/7e308903b0f075aabfdd6c301c6d63f5/Sample_data-ContractorType-DC-IBM-test-error.csv)
See Airflow log
[CSV-Ingestion-isssue-no-Integrity-...I made a test case with this source file.
[Sample_data-ContractorType-DC-IBM-test-error.csv](/uploads/7e308903b0f075aabfdd6c301c6d63f5/Sample_data-ContractorType-DC-IBM-test-error.csv)
See Airflow log
[CSV-Ingestion-isssue-no-Integrity-check-Airflow-log.txt](/uploads/062c44baff7e61dcd4c5dd6fefe12e10/CSV-Ingestion-isssue-no-Integrity-check-Airflow-log.txt)
This creates record happily although referenced data does not exist. ResourceCurationStatus:CREATED123
```
"data": {
"AttributionAuthority": "AAPG",
"Description": "Debasis test1",
"ID": "LoggerDC2",
"ResourceCurationStatus": "opendes:reference-data--ResourceCurationStatus:CREATED123",
"Code": "LoggerDC2",
"Name": "LoggerDC2"
},
```
See details here -
[CSV-Ingestion-isssue-no-Integrity-check.txt](/uploads/95ba8626b9020a832b23bb3235f10122/CSV-Ingestion-isssue-no-Integrity-check.txt)
@frubio for informationhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/76Search API failing due to violation of OWASP rules2022-09-12T07:42:03ZFabien BosquetSearch API failing due to violation of OWASP rulesThe current search query API is failing if the OWASP rules are apply to your Azure Application Gateway WAF.
Here is an exemple of search query:
`POST https://{{OSDU_HOST}}/api/search/v2/query`
with the following search body
`{
...The current search query API is failing if the OWASP rules are apply to your Azure Application Gateway WAF.
Here is an exemple of search query:
`POST https://{{OSDU_HOST}}/api/search/v2/query`
with the following search body
`{
"kind": "opendes:wks:work-product-component--*:1.0.0",
"query": "data.WellboreID:(\"opendes:master-data--Wellbore:1234\" OR \"opendes:master-data--Wellbore:1235\" OR \"opendes:master-data--Wellbore:1236\" OR \"opendes:master-data--Wellbore:1237\")"
}`
The issue is in the search body of the query.
The usage of `-`, `--` and `OR` symbols sequences in the OSDU SRN is breaking several OWASP rules.
(Note that you need to search for more than one SRN to break WAF rules).
A workaround is to disable the following OWASP 3.1 rules from the Web Application Firewall.
- 942370 Detects classic SQL injection probings 2/2
- 942430 Restricted SQL Character Anomaly Detection (args): # of special characters exceeded (12)
- 942440 SQL Comment Sequence Detected.
An alternative is to replace `-` characters by the \u002D Unicode in the "query" parameter but there is no substitution for `OR` (rules 942370 will still fail).
Do you see a better way to handle `-`, `OR`, `AND` in search query ?https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/101Performance review of main ingestion functions' improvements from M6 to M102022-08-11T20:42:13ZYan Sushchynski (EPAM)Performance review of main ingestion functions' improvements from M6 to M10<details><summary>GCP results - Click to expand</summary>
I was testing the main functions of Manifest Based Ingestion on my local machine from M6 to M10 releases.
Results are provided in the following table.
| Function ...<details><summary>GCP results - Click to expand</summary>
I was testing the main functions of Manifest Based Ingestion on my local machine from M6 to M10 releases.
Results are provided in the following table.
| Function | Manifest | M10_optimized (sec) | M9 (sec) | M8 (sec) | M7 (sec) | M6 (sec) |
|-----------------------------------------------|------------------------------|---------------------|----------|----------|----------|----------|
| schema_validator.ensure_manifest_validity | LogCurveType (42917 records) | 113 | 1453 | 1453 | 1453 | 1453 |
| | LogCurveType (800 records) | 2.6 | 25.38 | 25.38 | 25.38 | 25.38 |
| | WorkProduct | 2.5 | 2.685 | 2.895 | 2.895 | 2.895 |
| manifest_integrity_validator.ensure_integrity | LogCurveType (42917 records) | 14.94 | 15.07 | 14.67 | 40.2 | ** |
| | LogCurveType (800 records) | 5.494 | 4.677 | 5.82 | 5.141 | 3751 |
| | WorkProduct | 0.0013 | 0.001 | 0.001446 | 0.001852 | 0.001781 |
| single_manifest_processor.process_manifest | LogCurveType (42917 records) | | 2056* | ** | ** | ** |
| | LogCurveType (800 records) | | 43.18* | 439.3 | 439.3 | 439.3 |
| | WorkProduct | | 2.544 | 2.454 | 2.887 | 2.6 |
*_Sent batches of 400 records to Storage Service_
**_Can't execute this test for reasonable time (it may last more than 24h)_
### Performance improvements throughout M6-M10 releases.
#### M10 (?)
After analyzing the previous releases, some bottlenecks were found.
The slowest part of Manifest Ingestion, besides Process Manifest, was Schema Validation. After some researches, it was found that a common way of using `jsonschema.validate` has a lot of overhead with creating classes and instances of `validators` on each schema validation.
The solution was to create `jsonschema.validators` on each unique schema one time and reuse them against corresponding records. This approach is roughly 10 times faster than the usual one of `jsonschema.validate`.
E.g., `M9 Schema Validation` of 42917 LogCurveType records was _1453_ seconds, and it is _113.1_(!) seconds on `M10` release.
#### M9
In the previous releases, each Manifest's record was saved in Storage Service one by one, this cased a lot of requests to Storage.
After adding `Storing Manifest's records` with using Storage Service's Batch Saving (up to 500 records), it is possible to avoid extra requests to Storage.
E.g., `M8 manifest processing` of 800 LogCurveType records took _439_ seconds, meanwhile `M9 manifest processing` with batches of **400 records** took _43_ seconds.
#### M8
Improved `Manifest Integrity Validation` performance by sending batches of external OSDU Ids of all Manifest's records to Search Service. Before, these Ids were searched one by one; this caused extra calls of Search Service.
E.g., `M7 manifest integrity check` of 42917 LogCurveType records took _40.2_ seconds, meanwhile `M8 manifest integrity check` of the same Manifest took _14.67_ seconds.
#### M7
Improved `Manifest Integrity Validation` performance by extracting all external references in OSDU Search Service into a single set of unique Ids, and only then they are searched. This significantly reduced a number of requests to Search Service; earlier, each Manifest record's external references were searched separately, this caused calling Search Service with the same requests many time.
E.g., `M6 manifest integrity check` of 800 LogCurveType records took _3751_ seconds, meanwhile `M7 manifest integrity check` of the same Manifest took _5.141_ seconds.
</details>
<details><summary>AWS results - Click to expand</summary>
#M12
* Manifest by Reference implementation in validation and integrity check stages performs only marginally faster than current implementation. If the ADR’s design of adding a POST request to the stage was accepted, these marginal improvements might actually be slower.
* The reference implementation showed a 9x performance decrease over existing implementation for the process_manifest step.
| Function | Manifest | Original Manifest Ingestion (avg sec) | Manifest By Reference (avg sec) |
|-----------------------------------------------|----------------|---------------------------------------|---------------------------------|
| schema_validator.ensure_manifest_validity | 4kb Manifest | 2.85 | 3.12 |
| | 128kb Manifest | 6 | 6 |
| | 4mb Manifest | 6.12 | 5.9 |
| manifest_integrity_validator.ensure_integrity | 4kb Manifest | 2.85 | 2.98 |
| | 128kb Manifest | 4 | 3.5 |
| | 4mb Manifest | 4.27 | 4 |
| single_manifest_processor.process_manifest | 4kb Manifest | 2.73 | 24.7 |
| | 128kb Manifest | 4 | 25 |
| | 4mb Manifest | 2.9 | N/A** |
| Total time | 4kb Manifest | 8.43 | 30.8 |
| | 128kb Manifest | 14 | 34.5 |
| | 4mb Manifest | 13.29 | N/A** |
</details>Siarhei Khaletski (EPAM)Yan Sushchynski (EPAM)Aleksandr Spivakov (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/217Azure Ad Application URIIdentifiers new restrictions added causing deployment...2022-01-25T08:04:48ZVivek OjhaAzure Ad Application URIIdentifiers new restrictions added causing deployment failureCreating OSDU azure instance central resources giving following error
Error: graphrbac.ApplicationsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code...Creating OSDU azure instance central resources giving following error
Error: graphrbac.ApplicationsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="Unknown" Message="Unknown service error" Details=[{"odata.error":{"code":"Request_BadRequest","date":"2021-12-15T14:49:28","message":{"lang":"en","value":"Values of identifierUris property must use a verified domain of the organization or its subdomain: 'http://osdu-mvp-cr022-0bsd-app'"},"requestId":"84ba8e6f-224b-4b88-9a0c-587a52afc283","values":[{"item":"PropertyName","value":"identifierUris"},{"item":"PropertyErrorCode","value":"HostNameNotOnVerifiedDomain"},{"item":"HostName","value":"http://osdu-mvp-cr022-0bsd-app"}]}}]
on ../../../modules/providers/azure/ad-application/main.tf line 20, in resource "azuread_application" "main":
20: resource "azuread_application" "main" {https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/189[IBM R3M9](Segy to Ovds conversion) Air flow DAG failure after run Segy to Vd...2022-08-23T12:46:22Zkenneth liew[IBM R3M9](Segy to Ovds conversion) Air flow DAG failure after run Segy to Vds” conversationI had run a [R3M9 IBM ] “Segy to Vds” conversation task, but I found that the DAG(openvds_import) is failed to process my workflow.
```
Request :
Post Method: {{WORKFLOW_HOST}}/workflow/{{workflow_name_openvds}}/workflowRun
Body:
{
...I had run a [R3M9 IBM ] “Segy to Vds” conversation task, but I found that the DAG(openvds_import) is failed to process my workflow.
```
Request :
Post Method: {{WORKFLOW_HOST}}/workflow/{{workflow_name_openvds}}/workflowRun
Body:
{
"executionContext": {
"url_connection": "sdauthorityurl=https://{{osdu-cpd}}/osdu-seismic/api/v3;sdapikey=xx;sdtoken={{access_token}};EndpointOverride={{minio-url}}",
"input_connection": "sdauthorityurl=https://{{osdu-cpd}}/osdu-seismic/api/v3;sdapikey=xx;sdtoken={{access_token}};EndpointOverride={{minio-url}}",
"segy_file": "sd://opendes/kenneth-test/ST10010ZC11_PZ_PSDM_KIRCH_NEAR_D.MIG_FIN.POST_STACK.3D.JS-017536_Version4.segy",
"url": "sd://opendes/kenneth-test/"
}
}
```
```
Response:
{
"workflowId": "b133573249894456a32813caac1bf692",
"runId": "e831ba7b-6e80-4c0a-ba83-05cfa55cf309",
"startTimeStamp": 1639650166798,
"status": "submitted",
"submittedBy": "preshipteama@osdu.opengroup.org"
}
```
AirFlow DAG log-openvds_import
```
2021-12-16 10:24:51,867] {taskinstance.py:1501} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 366, in execute
final_state, remote_pod, result = self.create_new_pod_for_operator(labels, launcher)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 519, in create_new_pod_for_operator
launcher.start_pod(self.pod, startup_timeout=self.startup_timeout_seconds)
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 329, in wrapped_f
return self.call(f, *args, **kw)
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 409, in call
do = self.iter(retry_state=retry_state)
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 356, in iter
return fut.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 412, in call
result = fn(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 131, in start_pod
raise AirflowException("Pod took too long to start")
airflow.exceptions.AirflowException: Pod took too long to start
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 373, in execute
raise AirflowException(f'Pod Launching failed: {ex}')
airflow.exceptions.AirflowException: Pod Launching failed: Pod took too long to start
[2021-12-16 10:24:51,873] {taskinstance.py:1544} INFO - Marking task as FAILED. dag_id=openvds_import, task_id=OPENVDS, execution_date=20211216T102247, start_date=20211216T102249, end_date=20211216T102451
[2021-12-16 10:24:51,959] {local_task_job.py:151} INFO - Task exited with return code 1
```
[AirFlow_IBM_r3m9_Error_.txt](/uploads/af9d88da349f1b210a9e529f5c623252/AirFlow_IBM_r3m9_Error_.txt)
[VDS_Convert_Request_IBM_R3m9_.txt](/uploads/466a043ce3070bbedb4a7f5071f13796/VDS_Convert_Request_IBM_R3m9_.txt)
cc @anujgupta ,@dsouzawalter ,@shamazumhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/49Create new subproject request. Strange behavior of "admin" field.2023-03-27T19:13:05ZDenis Karpenok (EPAM)Create new subproject request. Strange behavior of "admin" field.Request:
`curl --location --request POST 'https://preship.osdu.club/api/seismic-store/v3/subproject/tenant/autotestTenantid977226/subproject/subprojectodi912864' \
--header 'Content-Type: application/json' \
--header 'data-partition-id:...Request:
`curl --location --request POST 'https://preship.osdu.club/api/seismic-store/v3/subproject/tenant/autotestTenantid977226/subproject/subprojectodi912864' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: odesprod' \
--header 'ltag;' \
--header 'Authorization: ***' \
--data-raw '{
"admin": "preprod_tester@osdu-gcp.go3-nrg.projects.epam.com",
"storage_class": "MULTI_REGIONAL",
"storage_location": "US",
"acls": {
"admins": [
"data.sdms.autotestTenantid977226.subprojectodi912864.admin@odesprod.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.sdms.autotestTenantid977226.subprojectodi912864.viewer@odesprod.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"odesprod-OpenVDS-Legal-Tag-Test4730019"
],
"otherRelevantDataCountries": [
"US"
]
}
}'`
If "admin" field is preprod_tester@osdu-gcp.go3-nrg.projects.epam.com - works well.
If "admin" field is preprod_tester@**odesprod**.osdu-gcp.go3-nrg.projects.epam.com:
400 Bad Request
[entitlement-service] Group can only be MEMBER of another group.
If "admin" field is **admin@odesprod**.osdu-gcp.go3-nrg.projects.epam.com:
`<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.
</h2>
<h2></h2>
</body>
</html>`https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/48Subproject list endpoint does not return the full list2023-03-27T19:14:35ZKonstantin KhottchenkovSubproject list endpoint does not return the full listIBM implementation of SUBPROJECT LIST endpoint does not return the full list of subprojects.
So E2E tests are failing trying to find the subproject that was created by them in the list.
The logs are here: [pipeline logs](https://communit...IBM implementation of SUBPROJECT LIST endpoint does not return the full list of subprojects.
So E2E tests are failing trying to find the subproject that was created by them in the list.
The logs are here: [pipeline logs](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/764673)Walter DWalter Dhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/215Required secrets for postgresql in keyvault2021-12-10T06:24:37ZAalekh JainRequired secrets for postgresql in keyvaultWorkflow ingestion service needs to connect to the postgresql database (that is primarily used by airflow). This is required in order to implement the feature where we have to query the postgresql dataset.
As of now, there's no clear wa...Workflow ingestion service needs to connect to the postgresql database (that is primarily used by airflow). This is required in order to implement the feature where we have to query the postgresql dataset.
As of now, there's no clear way to obtain the hostname and username (for db) that will allow us to connect to the postgreql for running the custom queries.
These changes are added as part of the following MR in workflow ingestion service -
https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/199
The corresponding MR (in infra azure provisioning) that adds these changes is - !549
cc: @kibattulhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/reservoir/home/-/issues/4Reservoir DDMS: POC for RESQML ingestion2022-01-11T13:37:53ZMarcus ApelReservoir DDMS: POC for RESQML ingestionPOC for RESQML file ingestion to DDMS, via Airflow DAG.POC for RESQML file ingestion to DDMS, via Airflow DAG.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/reservoir/home/-/issues/3Reservoir DDMS: Indexing POC2022-01-11T13:39:20ZMarcus ApelReservoir DDMS: Indexing POCPOC for indexing of WPC in DocumentDB.POC for indexing of WPC in DocumentDB.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/reservoir/home/-/issues/2Reservoir DDMS: Deployment2022-06-23T17:56:06ZMarcus ApelReservoir DDMS: DeploymentPOC for deployment of RDDMS with CSP. Requires postgres service.POC for deployment of RDDMS with CSP. Requires postgres service.Marcus ApelMarcus Apelhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/reservoir/home/-/issues/1Reservoir DDMS Entitlement2022-01-11T13:39:54ZMarcus ApelReservoir DDMS EntitlementUse platform entitlement for RDDMS access.Use platform entitlement for RDDMS access.Laurent DenyMarcus ApelLaurent Denyhttps://community.opengroup.org/osdu/platform/pre-shipping/-/issues/179[IBM R3M9](Segy to Ovds conversion) Cannot find the oVDS file from Sdutil2022-08-23T12:46:20Zkenneth liew[IBM R3M9](Segy to Ovds conversion) Cannot find the oVDS file from SdutilI had uploaded my segy file by using sdutil and the file path is "sd://opendes/kenneth-test/ST0202R08_PZ_PSDM_RAW_STACK_DEPTH.MIG_RAW.POST_STACK.3D.JS-017534.segy", then I run the triggered workflow to process the "Segy to oVDS", but I c...I had uploaded my segy file by using sdutil and the file path is "sd://opendes/kenneth-test/ST0202R08_PZ_PSDM_RAW_STACK_DEPTH.MIG_RAW.POST_STACK.3D.JS-017534.segy", then I run the triggered workflow to process the "Segy to oVDS", but I cannot find the VDS file from sd://opendes/kenneth-test
Based on Airflow log for that process, not list out which VDS file be generated.
"runId": "9565a380-1ab2-4632-b7a7-0b3127f7dfc5",
I using the postman collection:https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M9/IBM-M9/IBM_ODI_R3_v2.0.1_SEGY-to-Open_VDS_Conversion_Collection.postman_collection__1_.json
The attachment contains all information/details regarding the transaction
[_IBM_R3M9__Segy_to_Vdos_Test_Result.docx](/uploads/2486f136845ea9529420ca4d423d17a1/_IBM_R3M9__Segy_to_Vdos_Test_Result.docx)
cc @anujgupta ,@debasisc ,@shamazumhttps://community.opengroup.org/osdu/platform/system/schema-service/-/issues/82Provide validation of schema and avoid surprise from indexed data (lacking fi...2023-09-28T14:43:13ZDebasis ChatterjeeProvide validation of schema and avoid surprise from indexed data (lacking fields)Because of gap in schema definition script, this is one case of user experience.
We can successfully populate record. Retrieve data by using Storage service. But Search service (query) misses many fields.
Even if we do usual troublesho...Because of gap in schema definition script, this is one case of user experience.
We can successfully populate record. Retrieve data by using Storage service. But Search service (query) misses many fields.
Even if we do usual troubleshooting (Storage - Get - id and index), there is no obvious clue anywhere about the failure.
See related issue https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/64
Hence we propose that strict validation is added to avoid future surprise.
cc @nthakur for informationhttps://community.opengroup.org/osdu/ui/data-loading/osdu-cli/-/issues/10Integration with wellbore ddms loader2021-12-07T01:15:31ZChad LeongIntegration with wellbore ddms loaderWe are developing in parallel a loader for the Wellbore DDMS. This ought to be integrated with osdu-cli as the base framework for data loading exercises in OSDU as the community tool.
https://community.opengroup.org/osdu/platform/data-...We are developing in parallel a loader for the Wellbore DDMS. This ought to be integrated with osdu-cli as the base framework for data loading exercises in OSDU as the community tool.
https://community.opengroup.org/osdu/platform/data-flow/data-loading/wellbore-ddms-las-loader/-/issues/29https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-sdutil/-/issues/16sdutil is unable to recognize sub-project which was created from Domain API2023-05-02T19:43:58ZDebasis Chatterjeesdutil is unable to recognize sub-project which was created from Domain API@sacha - I am working with @Yan_Sushchynski from GCP/EPAM team in Preship/R3M9 environment.
We created tenant from Domain API. (Made sure "ltag" in header contains valid legal tag)
We created sub-project for that tenant.
Next we see t...@sacha - I am working with @Yan_Sushchynski from GCP/EPAM team in Preship/R3M9 environment.
We created tenant from Domain API. (Made sure "ltag" in header contains valid legal tag)
We created sub-project for that tenant.
Next we see two failures.
1. Domain API itself cannot retrieve the freshly created sub-project.
https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/142
2. sdutil from Command line can see the tenant but not the sub-project.
This means we cannot progress with "sdutil cp" step.
https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/158
Can you please provide some more insight so that we can do some troubleshooting?
Thank youhttps://community.opengroup.org/osdu/ui/data-loading/wellbore-ddms-data-loader/-/issues/39Launch LAS loader (into Wellbore DDMS) from DAG (Airflow)2022-08-23T10:50:56ZDebasis ChatterjeeLaunch LAS loader (into Wellbore DDMS) from DAG (Airflow)Can you consider creating DAG component for just the LOAD aspect?
Use case -
1. User has already uploaded LAS file into Cloud storage, created wp, wpc, dataaset/file for LAS file and WellLog WPC.
2. The intent now is to launch the LAS-...Can you consider creating DAG component for just the LOAD aspect?
Use case -
1. User has already uploaded LAS file into Cloud storage, created wp, wpc, dataaset/file for LAS file and WellLog WPC.
2. The intent now is to launch the LAS-into-WDMS-loader with all required parameters in body of "trigger workflow".
Also, as catalog records have already been created, this should not attempt to create new Wellbore, nor new WellLog work-product component record.