OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2019-11-13T16:05:57Zhttps://community.opengroup.org/osdu/data/open-test-data/-/issues/14Change PreLoadFilePath to aws-osdu-demo-r12019-11-13T16:05:57ZKeith WallChange PreLoadFilePath to aws-osdu-demo-r1Needs to be aws-osdu-demo-r1 rather than aws-osdu-r1Needs to be aws-osdu-demo-r1 rather than aws-osdu-r1https://community.opengroup.org/osdu/platform/security-and-compliance/home/-/issues/14Support SAML/OAuth for authentication2021-06-16T22:20:30ZPaco Hope (AWS)Support SAML/OAuth for authenticationSome operators do not support OpenID Connect and are not prepared to take on a significant app that uses it.
### Operator Inputs
Operators have expressed unwillingness or lack of support for OpenID Connect:
- **Total**: They do not su...Some operators do not support OpenID Connect and are not prepared to take on a significant app that uses it.
### Operator Inputs
Operators have expressed unwillingness or lack of support for OpenID Connect:
- **Total**: They do not support OpenID Connect at this time.
- **Petronas**: indicated that they prefer SAML but are willing to consider OpenID Connect
- **ConocoPhillips**: uses OAuth, not OpenID Connect
- **Repsol**: uses SAML now, but open to OpenID Connect in future
Operators have expressed support and acceptance of OpenID Connect:
- **ExxonMobil**
- **Chevron**
- **Shell**
- **BP**M1 - Release 0.1Ferris ArgyleDania Kodeih (Microsoft)Wladmir FrazaoJoeFerris Argylehttps://community.opengroup.org/osdu/data/open-test-data/-/issues/15UnitofMeasure appears as UnitsofMeasure in manifest2019-11-13T16:10:14ZKeith WallUnitofMeasure appears as UnitsofMeasure in manifestReference value for UnitofMeasure appears as UnitsofMeasure in manifest for work product componentsReference value for UnitofMeasure appears as UnitsofMeasure in manifest for work product componentshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/2Collect Schema and Data for CSV ingestion2020-09-10T12:10:27ZStephen Whitley (Invited Expert)Collect Schema and Data for CSV ingestionM1 - Release 0.1Shreyas MehtaTodd DixonShreyas Mehtahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/40Issues with POMs in the repo (Circular dependency from Core to Test-Core and ...2020-08-20T20:08:50ZMatt WiseIssues with POMs in the repo (Circular dependency from Core to Test-Core and POM dependencies are structured differently than other services)The POMs in this service are structured differently than other services. In other services, the parent pom contains almost no dependencies and allows the Core & Test-Core POMs to specify dependencies individually.
In addition, the Test...The POMs in this service are structured differently than other services. In other services, the parent pom contains almost no dependencies and allows the Core & Test-Core POMs to specify dependencies individually.
In addition, the Test project is tightly coupled to the build of the Core creating a circular dependency.
In the root POM, the following is observed:
```xml
<modules>
<module>workflow-core</module>
<module>provider/workflow-azure</module>
<module>provider/workflow-gcp</module>
<!-- <module>provider/workflow-ibm</module> Fix: Missing classes-->
<module>provider/workflow-gcp-datastore</module>
<module>testing/workflow-test-core</module>
</modules>
```
Note that the module `testing/workflow-test-core` is referenced in the modules list. The test modules should know about the core modules, but not the other way around.
If the test module is removed from the build list, the project fails to compile successfully.Dmitriy RudkoOleksandr Kosse (EPAM)Riabokon Stanislav(EPAM)[GCP]Artem Nazarenko (EPAM)Dmitriy Rudko2020-08-21https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/103Put workflowRun as value for Airflow DAG run_id2021-03-17T09:21:26ZSiarhei Khaletski (EPAM)Put workflowRun as value for Airflow DAG run_idhttps://community.opengroup.org/osdu/data/open-test-data/-/issues/16For DataCollection.json schema, add properties to object2019-11-18T13:34:00ZKeith WallFor DataCollection.json schema, add properties to objectIn DataCollection.json we needed to add "properties":{} into Data.IndividualTypeProperties.FilterSpecification
comment - should do this as standard jsonIn DataCollection.json we needed to add "properties":{} into Data.IndividualTypeProperties.FilterSpecification
comment - should do this as standard jsonhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/5Schema service2020-10-16T10:26:44ZAn NgoSchema serviceM1 - Release 0.1https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/97DELETE /v1/workflow/{workflow_name}deleteWorkflowById fails to delete workflo...2022-08-23T11:19:22ZMonalisa SrivastavaDELETE /v1/workflow/{workflow_name}deleteWorkflowById fails to delete workflow which have been executedDELETE /v1/workflow/{workflow_name}deleteWorkflowById fails to delete workflow which have been executed, however the newly created workflow which have not been executed gets deleted successfully.
Actual Result : we get the following err...DELETE /v1/workflow/{workflow_name}deleteWorkflowById fails to delete workflow which have been executed, however the newly created workflow which have not been executed gets deleted successfully.
Actual Result : we get the following error :
{
"timestamp": 1615385317197,
"status": 404,
"error": "Not Found",
"message": "Workflow: csv_OneStep_wf doesn't exist",
"path": "/api/workflow/v1/workflow/csv_OneStep_wf"
}
Expected Result : The workflow should be delete successfully
Also please note though in the API we are mentioning to use the {workflow_name} description says deleteWorkflowById which should be corrected.Aalekh JainMonalisa SrivastavaAalekh Jainhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/54Redundant steps executed: validate schemas and ensure referential integrity2022-08-23T11:19:22ZBrady Spiva [AWS]Redundant steps executed: validate schemas and ensure referential integrity## Expected behavior
The validate schemas and ensure referential integrity operations should only need to be executed once per manifest
## Observed behavior
In the [top-level DAG definition](https://community.opengroup.org/osdu/platform...## Expected behavior
The validate schemas and ensure referential integrity operations should only need to be executed once per manifest
## Observed behavior
In the [top-level DAG definition](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/osdu-ingest-r3.py#L131), you can see “validate schema” and “ensure integrity” operators are executed as part of the DAG:
`branch_is_batch_op >> validate_schema_operator >> ensure_integrity_op >> process_single_manifest_file >> update_status_finished_op`
But then diving deeper into the `process_single_manifest_file` operator, it ALSO [validates schemas and ensures referential integrity](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L81), resulting in redundant API calls.
This problem will go unnoticed for small workloads, but for larger workloads the increased latency will start to quickly add up. Using the TNO and Volve sample ingestion dataset as an example, there are about **24,000 manifest files**. If this redundancy adds just 2 extra API calls per manifest ( one for schema validation, one for referential integrity checks ), and each API request takes 250 milliseconds, then this would increase the overall ingestion time by:
24,000 manifests * ( .25 seconds redundancy * 2 requests ) / 60 seconds per minute = **200 minutes, or 3.3 hours**.
As the ingestion workload size increases, this redundancy becomes a non-trivial amount of time. Naturally, your mileage may vary! I'm sure we'll see different latency results for different networks, different customers, different Cloud Providers, etcetera. I'm confident customer experiences will be improved by reducing this latency.
## Some proposed solutions
1) remove the [duplicated referential integrity call](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L97), the results of this operation aren't used anyway
2) change the way the manifest is obtained for the `process_single_manifest_file` operator, allowing the removal of the [duplicated schema validation call](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L98). We could use Airflow mechanisms ( Xcoms, variables, etcetera ) to reuse the manifest from the `validate_schema_operator`, but that might affect the atomicity of the operator.
What do you think?JoeSiarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Alan HensonJoehttps://community.opengroup.org/osdu/data/open-test-data/-/issues/17Syntax updates for seismic schema2019-11-18T13:32:40ZKeith WallSyntax updates for seismic schema2. In SeismicAcquisitionProject.json, SeismicProcessingProject.json, and Well.json we needed to change “$ID” to “$id” be json schema compliant.
3. Also in SeismicAcquisitionProject we needed to remove the ‘/’ from the Data.IndividualType...2. In SeismicAcquisitionProject.json, SeismicProcessingProject.json, and Well.json we needed to change “$ID” to “$id” be json schema compliant.
3. Also in SeismicAcquisitionProject we needed to remove the ‘/’ from the Data.IndividualTypeProperties.SeismicLines.LineNames reference url.
https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/98GET /v1/workflow/{workflow_name}/workflowRun getAllRunInstances doesnt respon...2021-04-09T13:07:56ZMonalisa SrivastavaGET /v1/workflow/{workflow_name}/workflowRun getAllRunInstances doesnt respond correctlyGET /v1/workflow/{workflow_name}/workflowRun getAllRunInstances reqs params even with blank json it gives a 200 OK however no details in response
Following filters are also missing :
String prefix = (String) params.get("prefix");
String ...GET /v1/workflow/{workflow_name}/workflowRun getAllRunInstances reqs params even with blank json it gives a 200 OK however no details in response
Following filters are also missing :
String prefix = (String) params.get("prefix");
String startDate = (String) params.get("startDate");
String endDate = (String) params.get("endDate");
String limit = (String) params.get("limit");Kishore BattulaAalekh JainKishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/71Airflow 2.0 WIP2022-03-09T14:56:29ZBen LasscockAirflow 2.0 WIPTracking the task list for the Airflow 2.0 port described [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/65).
### Install Airflow 2.0 to all environments
Azure
* [ ] Not Started
* [ ] ...Tracking the task list for the Airflow 2.0 port described [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/65).
### Install Airflow 2.0 to all environments
Azure
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
AWS
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
GCP
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
IBM [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/66) M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### Airflow 2.0 required DAG code changes (level of effort estimated at 5-days)
The strategy is for (EPAM) to deliver code changes for the manifest based ingestion DAGs [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60) and the WITSML parser [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61) which will demonstrate the required changes to the Python code to support both Airflow 2.+ and a deprecating strategy to continue support for Airflow 1.10.X.
2.1 Manifest-based ingestion (Python) [issue] @Kateryna_Kurach (https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60)
* [ ] Not Started
* [ ] In Progress - Planned delivery tagged for M7
* [ ] Blocking
* [x] Completed
2.2 WITSML parser (Python) [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61) @Kateryna_Kurach
* [ ] Not Started
* [ ] In Progress - Planned delivery tagged for M7
* [ ] Blocking
* [x] Completed
2.3 SEGY -> OpenVDS - Planned delivery tagged for M8
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
SEGY -> ZGY [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/issues/4) @sacha
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
CSV Parser [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/33) @tdixon @sdubey7 @frubio (M8 by EPAM)
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### Workflow Services (level of effort estimated at 10-days)
Plans for updating the workflow services to be compatible with Airflow 2.0 is detailed here [issue](https://community.opengroup.org/osdu/platform/data-flow/data-workflow-framework/data-workflow/-/issues/1). ~~(EPAM) has taken on this work.~~
Implement Aifrlow 2.0 stable API in workflow-core, make it optional.
* [ ] Not Started - Planned delivery tagged for ~~M8~~ M9
* [x] In Progress
* [ ] Blocking
* [ ] Completed
Azure Migration @kibattul delivery M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
AWS Migration @Wibben
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
GCP Migration @Kateryna_Kurach M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
IBM Migration @shrikgar M8
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### 4. Testing and pre-shipping
The dev team (??) will need to provide clear instructions for the Airflow upgrade which the CSP can follow in both QA and pre-ship environments so that it is prepped for testing by respective teams.
Create Airflow upgrade instructions.
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completed
QA - Airflow 2.+ readiness
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completed
Pre-shipping - Airflow 2.+ readiness
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completedhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/48Manifest ingestion fails with large number of wells2022-08-23T11:19:22ZAlan HensonManifest ingestion fails with large number of wellsThis issue is a mirror of the issues created by Pre-Shipping Team A, which can be found here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64This issue is a mirror of the issues created by Pre-Shipping Team A, which can be found here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/87WIP - Airflow 2+ Adoption2022-03-16T07:33:32ZBen LasscockWIP - Airflow 2+ AdoptionThis issue is a place to track adoption of Airflow 2+ by the various CSP's.
Running Airflow 2+ with the experimental API is backward compatibility with the current workflow services. However it does provide potential performance improv...This issue is a place to track adoption of Airflow 2+ by the various CSP's.
Running Airflow 2+ with the experimental API is backward compatibility with the current workflow services. However it does provide potential performance improvements, particularly around the scheduler. Please update you current status here.
AWS - M9 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
Azure - M10 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
IBM - M9 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
GCP - M8 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/94Integration E2E Tests for manifest ingestion - AWS2022-08-24T14:48:52ZChris ZhangIntegration E2E Tests for manifest ingestion - AWSThis is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M10 - Release 0.13Gustavo UrdanetaGustavo Urdanetahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/96Integration E2E Tests for manifest ingestion - IBM2022-11-11T12:30:22ZChris ZhangIntegration E2E Tests for manifest ingestion - IBMThis is to track the IBM team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the IBM team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M12 - Release 0.15Shrikant Gargjingdong sunShrikant Garghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/77[R3-M8]Issue with “frame of reference” handling (Unit of Measure) by Manifest...2021-10-05T11:09:12Zkenneth liew[R3-M8]Issue with “frame of reference” handling (Unit of Measure) by Manifest-based Ingestion [GONRG-3041]The case is related to the unit conversation with the environment R3-M5 (Katalyst private AWS setup).
Related to “master-data—SeismicAcquisitionSurvey” entity.
Test case-1 :
I verified that Normalizer (Indexer) does perform auto-conve...The case is related to the unit conversation with the environment R3-M5 (Katalyst private AWS setup).
Related to “master-data—SeismicAcquisitionSurvey” entity.
Test case-1 :
I verified that Normalizer (Indexer) does perform auto-conversion as expected when we persist using storage service.
Test case-2 :
However, it does not perform auto-conversion when we persist data using manifest-based ingestion.
My test case uses length values in "feet" and I do see converted values (meter) from Search Service query in the first test case.
I also noticed that the “meta”’s details are missing on that record by using the “Storage Get” method after inserting the record by manifest ingestion.
I have attached some examples related to this issue.
Please focus on the blue highlight at the "ManifesIngestionMethod.docx" document.
FYI @debasisc
[ManifestIngestionMethod.docx](/uploads/4b774f1ccc88db210882ab79a49d70fb/ManifestIngestionMethod.docx)
[StorageMethod.docx](/uploads/2ce57a7bacbee29f3a66b2db07923ebf/StorageMethod.docx)M8 - Release 0.11Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/43IBM R3M8 - Failure to ingest Wellbore data from WITSML source2022-08-23T13:29:38ZDebasis ChatterjeeIBM R3M8 - Failure to ingest Wellbore data from WITSML sourceReported by @epeysson .
From Airflow log, we see the following as reason of failure.
```
"Missing referential id: {
'opendes:reference-data--VerticalMeasurementType:TotalDepth:',
'opendes:reference-data--VerticalMeasurementPath:Measur...Reported by @epeysson .
From Airflow log, we see the following as reason of failure.
```
"Missing referential id: {
'opendes:reference-data--VerticalMeasurementType:TotalDepth:',
'opendes:reference-data--VerticalMeasurementPath:MeasuredDepth:',
'opendes:reference-data--VerticalMeasurementPath:TrueVerticalDepth:',
```
Input source [Etienne-Wellbore.xml](/uploads/dcc04d22a77be52599a40ef3ae6f487d/Etienne-Wellbore.xml)
After some investigation, it was determined that OSDU Reference value convention has changed to use code instead.
reference-data--VerticalMeasurementType:TD instead of TotalDepth.
reference-data--VerticalMeasurementPath:TVD instead of TrueVerticalDepth
reference-data--VerticalMeasurementPath:MD instead of MeasuredDepth
One possible solution is to change code for such changes.
Another alternative can be to hold field mapping in some configuration file so that it is easy to handle this kind of change in the future.
For other data types (Well, Marker, Log, Trajectory, Tubular), there will be potentially other mismatches too.
Copying to @janas712 for her input on the subject
Also adding @gehrmann for his awareness
cc - @ChrisZhang , @chad , @Keith_Wall for informationM10 - Release 0.13etienne peyssonetienne peyssonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/48WITSML parser fails - expects parent entity with a specified version2022-01-19T19:26:40ZKateryna Kurach (EPAM)WITSML parser fails - expects parent entity with a specified versionI am trying to process the file (see attached). WITSML ingestion fails with the following error message:
` {'provide_manifest_integrity_task': [{'id': 'odesprod:work-product--WorkProduct:20C60DDC-D36D-4A3C-800F-504CE0B5605D', 'kind': 'o...I am trying to process the file (see attached). WITSML ingestion fails with the following error message:
` {'provide_manifest_integrity_task': [{'id': 'odesprod:work-product--WorkProduct:20C60DDC-D36D-4A3C-800F-504CE0B5605D', 'kind': 'odesprod:wks:work-product--WorkProduct:1.0.0', 'reason': 'Missing parents: {SRN: odesprod:work-product-component--WellboreTrajectory:20C60DDC-D36D-4A3C-800F-504CE0B5605D}'}, {'id': 'odesprod:work-product-component--WellboreTrajectory:20C60DDC-D36D-4A3C-800F-504CE0B5605D', 'kind': 'odesprod:wks:work-product-component--WellboreTrajectory:1.0.0', 'reason': 'Missing parents: {SRN: odesprod:dataset--File.WITSML:20C60DDC-D36D-4A3C-800F-504CE0B5605D:1}'}]} `
The problem is that it expects a dataset record with a version
`odesprod:dataset--File.WITSML:20C60DDC-D36D-4A3C-800F-504CE0B5605D:1`
, but creates a dataset record without a version.[Trajectory.xml](/uploads/a43e17d20edcab84859f7cb49f72687c/Trajectory.xml)M10 - Release 0.13Laurent Denyetienne peyssonLaurent Deny