Ingestion Workflow issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues2021-06-16T22:18:07Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/15Preservation of Lineage2021-06-16T22:18:07ZKateryna Kurach (EPAM)Preservation of LineageThis user story comes from the core OSDU principle: Data lineage is tracked.
This user story can be split into 2 sub-stories:
### 1. Source lineage
This is an ability to track the source of the data: what file it came from, all versio...This user story comes from the core OSDU principle: Data lineage is tracked.
This user story can be split into 2 sub-stories:
### 1. Source lineage
This is an ability to track the source of the data: what file it came from, all versions of further transformations.
### 2. Attribute lineage
Ability to track attributes lineage: how the attribute was transformed.
**Questions:**
- It seems that Lineage API has to be created
- How all versions of data should be displayed?
OpenDES supports both of these scenarios.
It seems that it will be CSP-specific implementation.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/20[Parsers] Develop SEG-Y - Seismic Parser2021-06-16T22:18:06ZKateryna Kurach (EPAM)[Parsers] Develop SEG-Y - Seismic Parserhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/24Cloud Datasource Support - GCP2021-06-16T22:18:03ZKateryna Kurach (EPAM)Cloud Datasource Support - GCPThis user story covers a scenario when a datasource is located in the same cloud instance as OSDU (GCP)This user story covers a scenario when a datasource is located in the same cloud instance as OSDU (GCP)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/25Cloud Datasource Support - AWS2021-06-16T22:18:02ZKateryna Kurach (EPAM)Cloud Datasource Support - AWSThis user story covers a scenario when a datasource is located in the same cloud instance as OSDU (AWS)This user story covers a scenario when a datasource is located in the same cloud instance as OSDU (AWS)JoeJoehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/27Cloud Datasource Support - IBM Cloud2021-06-16T22:18:01ZKateryna Kurach (EPAM)Cloud Datasource Support - IBM CloudThis user story covers a scenario when a datasource is located in the same cloud instance as OSDU (IBM Cloud)This user story covers a scenario when a datasource is located in the same cloud instance as OSDU (IBM Cloud)Wladmir FrazaoAlan BrazWladmir Frazaohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/28Cloud Datasource Support - Support not-native datasource2021-06-16T22:17:59ZKateryna Kurach (EPAM)Cloud Datasource Support - Support not-native datasourceThis user story covers a scenario when OSDU instance is installed in one type of cloud and sources data from the location created in another type of cloud (e.g. OSDU instance in MSFT Azure and File Storage is in GCP).
It is possible tha...This user story covers a scenario when OSDU instance is installed in one type of cloud and sources data from the location created in another type of cloud (e.g. OSDU instance in MSFT Azure and File Storage is in GCP).
It is possible that implementation will be different for each CSP.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/29Datasources Support - FTP2021-06-16T22:17:58ZKateryna Kurach (EPAM)Datasources Support - FTPFiles are located in the On-Prem env and have to be sourced into OSDU.
**Questions:**
Architectural decision on the implementation has to be taken.Files are located in the On-Prem env and have to be sourced into OSDU.
**Questions:**
Architectural decision on the implementation has to be taken.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/30Datasources Support - External Database2021-06-16T22:17:57ZKateryna Kurach (EPAM)Datasources Support - External DatabaseThis user story describes an ability to source data from the Database.
There may be some overlap in scope and work with External Datasources Stream:
https://community.opengroup.org/osdu/program/-/wikis/Release_Planning/R3/External_Data_...This user story describes an ability to source data from the Database.
There may be some overlap in scope and work with External Datasources Stream:
https://community.opengroup.org/osdu/program/-/wikis/Release_Planning/R3/External_Data_Sourceshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/34Ability to create a DAG Operators Library2021-06-16T22:17:56ZKateryna Kurach (EPAM)Ability to create a DAG Operators LibraryUser should have an ability to create a Library of DAG Operators. These operators may be:
- Published service
- Third Party Component Library
- Another DAG etc
Start building a library of reusable components after Energistics WITSML par...User should have an ability to create a Library of DAG Operators. These operators may be:
- Published service
- Third Party Component Library
- Another DAG etc
Start building a library of reusable components after Energistics WITSML parser demo.Dmitriy RudkoDmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/36[Non-functional Requirements] Ingestion - Support of big volumes of data2021-06-16T22:17:55ZKateryna Kurach (EPAM)[Non-functional Requirements] Ingestion - Support of big volumes of data**Acceptance criteria:**
We need to define AC**Acceptance criteria:**
We need to define ACDmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/37[Non-functional Requirements] Ingestion - Ability to run long-running jobs2021-06-16T22:17:54ZKateryna Kurach (EPAM)[Non-functional Requirements] Ingestion - Ability to run long-running jobsSome of the Ingestion jobs can be long-running, Ingestion Framework should support such cases (Authentication aspect etc).
Ingestion jobs should not be using a user token to run the jobs. Some Airflow token should be used for workflows r...Some of the Ingestion jobs can be long-running, Ingestion Framework should support such cases (Authentication aspect etc).
Ingestion jobs should not be using a user token to run the jobs. Some Airflow token should be used for workflows running.
OpenDES supports such a scenario - up to 30dDmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/39Ability to support several file_ids in 1 workflow2021-06-16T22:17:53ZKateryna Kurach (EPAM)Ability to support several file_ids in 1 workflowIt is possible that several files should be used to populate 1 WPC.
E.g. processing of SEGY files: there is a need to add auxiliary information from CSV or UKOOA files to the information extracted from SEGY to populate one WPC.
Please re...It is possible that several files should be used to populate 1 WPC.
E.g. processing of SEGY files: there is a need to add auxiliary information from CSV or UKOOA files to the information extracted from SEGY to populate one WPC.
Please review attached diagram.
One of the easiest ways of Ingesting both files in 1 workflow is to implement 2 parsers (SEGY and CSV) on Step 3.3. In this case, the workflow should support ability to work with 2 file_ids. We dont have this feature now, so this scenario cannot be implemented.
There is a workaround to populate additional CSV information in the CONTEXT part of SubmitWithManifest request (in case there are not a lot of attribute to add) OR to ingest additional information extracted from CSV during the Enrichment flow. But adding several parsers to the Ingestion flow is a cleaner way of implementing.![Ingestion_diagram](/uploads/971feab38057db9b8b376de74d149731/Ingestion_diagram.png)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/44CSV Ingestion - Horizon 1 - Workflow Service Tasks2021-06-16T22:17:52ZStephen Whitley (Invited Expert)CSV Ingestion - Horizon 1 - Workflow Service Tasks
- [x] Create an end point to create a dag by passing .py file.
- [ ] Ability to validate a dag for syntactical issue.- Check for valid airflow constructs,Check for cyclicity in dags.
- [ ] Ability to save the .py file(dag) in airflow /d...
- [x] Create an end point to create a dag by passing .py file.
- [ ] Ability to validate a dag for syntactical issue.- Check for valid airflow constructs,Check for cyclicity in dags.
- [ ] Ability to save the .py file(dag) in airflow /dag mount.
- [ ] Ability to check if dag is successfully registered in Airflow.
- [ ] Ability to restore back the old dag in case of dag update.
- [ ] Ability to delete a dag.
- [ ] Ability to view an airflow dag.
- [ ] Ability to trigger multiple dags.
- [x] Ability to trigger a dag.
- [ ] Ability to stop a dag run.
- [ ] Ability to pause/un pasue a dag.
- [ ] Ability to get previous executions of a dag.
- [ ] Ability to get details of a dag run.
- [ ] Ability to clear and re run the failed dag from where it failedTodd DixonTodd Dixonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/46Ability to register a workflow not associated with a specific Data Type (High...2021-06-16T22:17:51ZKateryna Kurach (EPAM)Ability to register a workflow not associated with a specific Data Type (Higher Level Workflow)E.g. it may be a workflow that does something according to the schedule.
The following questions should be considered:
- How to run this type of workflows?
- What reusable components do we need to build?
- What interface these components...E.g. it may be a workflow that does something according to the schedule.
The following questions should be considered:
- How to run this type of workflows?
- What reusable components do we need to build?
- What interface these components will have?Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/47[Non-functional Requirements] Ability to register non-python workflow compone...2021-06-16T22:17:49ZKateryna Kurach (EPAM)[Non-functional Requirements] Ability to register non-python workflow components as Airflow DAG componentsSome parsers may be based on the non-python code. We need to have an ability to integrate them into the Ingestion workflow.Some parsers may be based on the non-python code. We need to have an ability to integrate them into the Ingestion workflow.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/48[R3 Schemas Support] Ingestion workflow should support R3 Schema structure of...2021-06-16T22:17:48ZKateryna Kurach (EPAM)[R3 Schemas Support] Ingestion workflow should support R3 Schema structure of a ManifestLessons Learned from Energistics Demo:
Currently Ingestion Workflow doesn't support R3 Schema Structure of a Manifest.Lessons Learned from Energistics Demo:
Currently Ingestion Workflow doesn't support R3 Schema Structure of a Manifest.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/49[R3 Schemas Support] Load R3 Schemas into the system2021-06-16T22:17:47ZKateryna Kurach (EPAM)[R3 Schemas Support] Load R3 Schemas into the systemLessons Learned from Energistics Demo. We need to support R3 Schemas in a Manifest.Lessons Learned from Energistics Demo. We need to support R3 Schemas in a Manifest.Dmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/51[Parsers] Ability to register new Data Type2021-06-16T22:17:45ZKateryna Kurach (EPAM)[Parsers] Ability to register new Data TypeLessons Learned from Energistics demo.
We need to create an end-point for easy creation of a configuration record for new parser component (Workflow Service configuration table), possibility to associate an Airflow DAG with this new Dat...Lessons Learned from Energistics demo.
We need to create an end-point for easy creation of a configuration record for new parser component (Workflow Service configuration table), possibility to associate an Airflow DAG with this new Data Type. Ideally a user should send a package that contains all necessary information for this registration to happen automatically.
This registration was done manually during the Energistics demo.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/52[Parsers] Add support of complex Data Type2021-06-16T22:17:44ZKateryna Kurach (EPAM)[Parsers] Add support of complex Data TypeIn the Submit Request "Data Type" parameter is a "string" type now. It should be changed to "custom object" type to support complex data structures. E.g. it is necessary for supporting CSV files ingestion or events.In the Submit Request "Data Type" parameter is a "string" type now. It should be changed to "custom object" type to support complex data structures. E.g. it is necessary for supporting CSV files ingestion or events.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/53[Parsers] Add support of Airflow DAGs based on non-python code2021-06-16T22:17:43ZKateryna Kurach (EPAM)[Parsers] Add support of Airflow DAGs based on non-python codeCurrently Airflow DAGs can be based only on python code. We need to add support of integrating non-python code into Airflow DAGs.Currently Airflow DAGs can be based only on python code. We need to add support of integrating non-python code into Airflow DAGs.