Ingestion Workflow issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues2022-06-28T19:45:51Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/58[Validation] [Master and Reference Data] Manifest Validation - check Referenc...2022-06-28T19:45:51ZKateryna Kurach (EPAM)[Validation] [Master and Reference Data] Manifest Validation - check Reference Data recordsValidation that Reference data values (SRNs) point to existing records.Validation that Reference data values (SRNs) point to existing records.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/57[Non-functional Requirements] Ability to parallelize data ingestion activities2021-06-16T22:17:40ZKateryna Kurach (EPAM)[Non-functional Requirements] Ability to parallelize data ingestion activitiesThis feature has to be added to more effectively ingest big volumes of data.This feature has to be added to more effectively ingest big volumes of data.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/56[Validation] Support of Access Control Lists and Legal Tags in Manifest2021-06-16T22:17:41ZKateryna Kurach (EPAM)[Validation] Support of Access Control Lists and Legal Tags in ManifestThis issue is related to integration with policy-based Entitlements Service.
We need to include the following functionality:
- Validation of ACL from Manifest
- Validation of Legal Tags
- Validation of Document Author, ACL value from the...This issue is related to integration with policy-based Entitlements Service.
We need to include the following functionality:
- Validation of ACL from Manifest
- Validation of Legal Tags
- Validation of Document Author, ACL value from the Manifest against Legal Tags that are present in the manifest (e.g. that Document Author has permissions to put a specific ACL value into the Manifest; that Doc Author has a right to put specific Legal Tags, etc) - detailed reqs have to be developed here.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/55[Parsers] Add support of libraries not copied to Airflow DAG repository2021-06-16T22:17:42ZKateryna Kurach (EPAM)[Parsers] Add support of libraries not copied to Airflow DAG repositoryIt is possible that some of the parsers that will be integrated with OSDU will be commercial solutions and owners of these parsers will not donate their code to OSDU.
Parser developers should follow an established standard and project st...It is possible that some of the parsers that will be integrated with OSDU will be commercial solutions and owners of these parsers will not donate their code to OSDU.
Parser developers should follow an established standard and project structure and do parser development in their own reporsitory. Ingestion team should document this process and project structure.
The same should be done for Airflow DAGs.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/54Some Integration Tests test for Unauthorized when no token is provided, shoul...2021-06-16T22:17:42ZSpencer Suttonsuttonsp@amazon.comSome Integration Tests test for Unauthorized when no token is provided, should be ForbiddenSee line 100 of PostStartWorkflowIntegrationTests.java in the test "should_returnUnauthorized_when_notGivenAccessToken". All of the other services return a 403 when no token is provided. Workflow service should follow suit. I want to sim...See line 100 of PostStartWorkflowIntegrationTests.java in the test "should_returnUnauthorized_when_notGivenAccessToken". All of the other services return a 403 when no token is provided. Workflow service should follow suit. I want to simply change this and the couple of other times this test happens to look for a forbidden instead.Spencer Suttonsuttonsp@amazon.comSpencer Suttonsuttonsp@amazon.comhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/53[Parsers] Add support of Airflow DAGs based on non-python code2021-06-16T22:17:43ZKateryna Kurach (EPAM)[Parsers] Add support of Airflow DAGs based on non-python codeCurrently Airflow DAGs can be based only on python code. We need to add support of integrating non-python code into Airflow DAGs.Currently Airflow DAGs can be based only on python code. We need to add support of integrating non-python code into Airflow DAGs.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/52[Parsers] Add support of complex Data Type2021-06-16T22:17:44ZKateryna Kurach (EPAM)[Parsers] Add support of complex Data TypeIn the Submit Request "Data Type" parameter is a "string" type now. It should be changed to "custom object" type to support complex data structures. E.g. it is necessary for supporting CSV files ingestion or events.In the Submit Request "Data Type" parameter is a "string" type now. It should be changed to "custom object" type to support complex data structures. E.g. it is necessary for supporting CSV files ingestion or events.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/51[Parsers] Ability to register new Data Type2021-06-16T22:17:45ZKateryna Kurach (EPAM)[Parsers] Ability to register new Data TypeLessons Learned from Energistics demo.
We need to create an end-point for easy creation of a configuration record for new parser component (Workflow Service configuration table), possibility to associate an Airflow DAG with this new Dat...Lessons Learned from Energistics demo.
We need to create an end-point for easy creation of a configuration record for new parser component (Workflow Service configuration table), possibility to associate an Airflow DAG with this new Data Type. Ideally a user should send a package that contains all necessary information for this registration to happen automatically.
This registration was done manually during the Energistics demo.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/49[R3 Schemas Support] Load R3 Schemas into the system2021-06-16T22:17:47ZKateryna Kurach (EPAM)[R3 Schemas Support] Load R3 Schemas into the systemLessons Learned from Energistics Demo. We need to support R3 Schemas in a Manifest.Lessons Learned from Energistics Demo. We need to support R3 Schemas in a Manifest.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/48[R3 Schemas Support] Ingestion workflow should support R3 Schema structure of...2021-06-16T22:17:48ZKateryna Kurach (EPAM)[R3 Schemas Support] Ingestion workflow should support R3 Schema structure of a ManifestLessons Learned from Energistics Demo:
Currently Ingestion Workflow doesn't support R3 Schema Structure of a Manifest.Lessons Learned from Energistics Demo:
Currently Ingestion Workflow doesn't support R3 Schema Structure of a Manifest.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/47[Non-functional Requirements] Ability to register non-python workflow compone...2021-06-16T22:17:49ZKateryna Kurach (EPAM)[Non-functional Requirements] Ability to register non-python workflow components as Airflow DAG componentsSome parsers may be based on the non-python code. We need to have an ability to integrate them into the Ingestion workflow.Some parsers may be based on the non-python code. We need to have an ability to integrate them into the Ingestion workflow.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/46Ability to register a workflow not associated with a specific Data Type (High...2021-06-16T22:17:51ZKateryna Kurach (EPAM)Ability to register a workflow not associated with a specific Data Type (Higher Level Workflow)E.g. it may be a workflow that does something according to the schedule.
The following questions should be considered:
- How to run this type of workflows?
- What reusable components do we need to build?
- What interface these components...E.g. it may be a workflow that does something according to the schedule.
The following questions should be considered:
- How to run this type of workflows?
- What reusable components do we need to build?
- What interface these components will have?Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/45Validate Reference Data2021-03-10T22:31:17ZStephen Whitley (Invited Expert)Validate Reference Datahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/44CSV Ingestion - Horizon 1 - Workflow Service Tasks2021-06-16T22:17:52ZStephen Whitley (Invited Expert)CSV Ingestion - Horizon 1 - Workflow Service Tasks
- [x] Create an end point to create a dag by passing .py file.
- [ ] Ability to validate a dag for syntactical issue.- Check for valid airflow constructs,Check for cyclicity in dags.
- [ ] Ability to save the .py file(dag) in airflow /d...
- [x] Create an end point to create a dag by passing .py file.
- [ ] Ability to validate a dag for syntactical issue.- Check for valid airflow constructs,Check for cyclicity in dags.
- [ ] Ability to save the .py file(dag) in airflow /dag mount.
- [ ] Ability to check if dag is successfully registered in Airflow.
- [ ] Ability to restore back the old dag in case of dag update.
- [ ] Ability to delete a dag.
- [ ] Ability to view an airflow dag.
- [ ] Ability to trigger multiple dags.
- [x] Ability to trigger a dag.
- [ ] Ability to stop a dag run.
- [ ] Ability to pause/un pasue a dag.
- [ ] Ability to get previous executions of a dag.
- [ ] Ability to get details of a dag run.
- [ ] Ability to clear and re run the failed dag from where it failedTodd DixonTodd Dixonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/43POM dependencies are structured differently than other services2021-02-09T19:32:11ZMatt WisePOM dependencies are structured differently than other servicesCopied from 'Issues with POMs in the repo'...the circular dependencies were fixed, but one issue was still outstanding so moving to a lower priority issue as tech debt
The POMs in this service are structured differently than other servi...Copied from 'Issues with POMs in the repo'...the circular dependencies were fixed, but one issue was still outstanding so moving to a lower priority issue as tech debt
The POMs in this service are structured differently than other services. In other services, the parent pom contains almost no dependencies and allows the Core & Test-Core POMs to specify dependencies individually.Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/42FOSSA NOTICE out of date2020-08-20T18:58:49ZDavid Diederichd.diederich@opengroup.orgFOSSA NOTICE out of dateAs of ad2f1ffa the FOSSA NOTICE file is out of date ([Job Output](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/jobs/41208)). This may be related to the recent upgrade in FOSSA version -- osdu/pla...As of ad2f1ffa the FOSSA NOTICE file is out of date ([Job Output](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/jobs/41208)). This may be related to the recent upgrade in FOSSA version -- osdu/platform/ci-cd-pipelines!40.M1 - Release 0.1David Diederichd.diederich@opengroup.orgDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/40Issues with POMs in the repo (Circular dependency from Core to Test-Core and ...2020-08-20T20:08:50ZMatt WiseIssues with POMs in the repo (Circular dependency from Core to Test-Core and POM dependencies are structured differently than other services)The POMs in this service are structured differently than other services. In other services, the parent pom contains almost no dependencies and allows the Core & Test-Core POMs to specify dependencies individually.
In addition, the Test...The POMs in this service are structured differently than other services. In other services, the parent pom contains almost no dependencies and allows the Core & Test-Core POMs to specify dependencies individually.
In addition, the Test project is tightly coupled to the build of the Core creating a circular dependency.
In the root POM, the following is observed:
```xml
<modules>
<module>workflow-core</module>
<module>provider/workflow-azure</module>
<module>provider/workflow-gcp</module>
<!-- <module>provider/workflow-ibm</module> Fix: Missing classes-->
<module>provider/workflow-gcp-datastore</module>
<module>testing/workflow-test-core</module>
</modules>
```
Note that the module `testing/workflow-test-core` is referenced in the modules list. The test modules should know about the core modules, but not the other way around.
If the test module is removed from the build list, the project fails to compile successfully.Dmitriy RudkoOleksandr Kosse (EPAM)Riabokon Stanislav(EPAM)[GCP]Artem Nazarenko (EPAM)Dmitriy Rudko2020-08-21https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/39Ability to support several file_ids in 1 workflow2021-06-16T22:17:53ZKateryna Kurach (EPAM)Ability to support several file_ids in 1 workflowIt is possible that several files should be used to populate 1 WPC.
E.g. processing of SEGY files: there is a need to add auxiliary information from CSV or UKOOA files to the information extracted from SEGY to populate one WPC.
Please re...It is possible that several files should be used to populate 1 WPC.
E.g. processing of SEGY files: there is a need to add auxiliary information from CSV or UKOOA files to the information extracted from SEGY to populate one WPC.
Please review attached diagram.
One of the easiest ways of Ingesting both files in 1 workflow is to implement 2 parsers (SEGY and CSV) on Step 3.3. In this case, the workflow should support ability to work with 2 file_ids. We dont have this feature now, so this scenario cannot be implemented.
There is a workaround to populate additional CSV information in the CONTEXT part of SubmitWithManifest request (in case there are not a lot of attribute to add) OR to ingest additional information extracted from CSV during the Enrichment flow. But adding several parsers to the Ingestion flow is a cleaner way of implementing.![Ingestion_diagram](/uploads/971feab38057db9b8b376de74d149731/Ingestion_diagram.png)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/38Ingestion - Files Support - Unsupported file format2020-09-01T00:34:33ZKateryna Kurach (EPAM)Ingestion - Files Support - Unsupported file formatAbility to load a file of unsupported format into OSDU.
This can be implemented using 3 approaches:
1. Opaque ingestion (using generic manifest file)
2. Lots of files of this type have to be processed:
- User has to develop their own ...Ability to load a file of unsupported format into OSDU.
This can be implemented using 3 approaches:
1. Opaque ingestion (using generic manifest file)
2. Lots of files of this type have to be processed:
- User has to develop their own parser
- Parser should be able to automatically generate a Manifest file
- Parser should be built in into custom workflow
3. Few files have to be processed
- Generic Manifest file is created manually
- Regular Ingestion workflow has to be executedhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/37[Non-functional Requirements] Ingestion - Ability to run long-running jobs2021-06-16T22:17:54ZKateryna Kurach (EPAM)[Non-functional Requirements] Ingestion - Ability to run long-running jobsSome of the Ingestion jobs can be long-running, Ingestion Framework should support such cases (Authentication aspect etc).
Ingestion jobs should not be using a user token to run the jobs. Some Airflow token should be used for workflows r...Some of the Ingestion jobs can be long-running, Ingestion Framework should support such cases (Authentication aspect etc).
Ingestion jobs should not be using a user token to run the jobs. Some Airflow token should be used for workflows running.
OpenDES supports such a scenario - up to 30dDmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudko