Manifest Ingestion DAG issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues2022-03-21T15:29:30Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/91Date-Time validation causing ingestion failure2022-03-21T15:29:30ZKeith WallDate-Time validation causing ingestion failureDate values are failing schema validation on ingestion if the dates are not in UTC, or don't contain a time-zone offset.
This is a new validation, that requires date-times to conform to RFC3339. The intent is good, but it does not confo...Date values are failing schema validation on ingestion if the dates are not in UTC, or don't contain a time-zone offset.
This is a new validation, that requires date-times to conform to RFC3339. The intent is good, but it does not conform to schemas, or to our data.
There is a large volume of date in which a date is known, but for which no time or time zone is provided. Recognizing this, the OSDU schema only require that dates be a string.
As a large volume of data is managed that does not have time zone information, our options are either to reject all these dates, or to ingest and maintain them in original format.
If we force a time zone change on data by putting it into a UTC format when we really do not know the time zone, we are corrupting the data.
I have consulted the Enterprise Architecture Geomatics team, and asked if we should (1) Do not load dates with unknown time zones or (2) maintain the dates in as-provided form. There was complete agreement that the industry has a large volume of data with dates without time zones, and we can still make use of those dates, but must not modify them by adding a default time zone.
Please remove the date validation from ingestion.M9 - Release 0.12Kishore BattulaShrikant GargSpencer Suttonsuttonsp@amazon.comYan Sushchynski (EPAM)Kishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/89Move Airflow common logic to osdu-airflow-lib project2021-09-27T03:39:50ZSiarhei Khaletski (EPAM)Move Airflow common logic to osdu-airflow-lib projectSome of DAGs have dependencies on code from Ingestion DAGs repository.
For instance, the MRs bring updates for parsers DAGs:
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/merge_requests/24
-...Some of DAGs have dependencies on code from Ingestion DAGs repository.
For instance, the MRs bring updates for parsers DAGs:
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/merge_requests/24
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/merge_requests/36
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/merge_requests/149
this requires `UpdateStatusOperator` class from Ingestion DAGs project. It means, that it will require to deploy Ingestion DAGs code into environment to be able to use it for DAGs from MRs above.
The real case now is for WISTML Parser, where we have to add `osdu_manifest` code to use operators for WITSML Parser DAG steps.
**Expects**: All the Airflow related logic (operators, hooks, etc.) is able to be installed into environment independently (using pip) of Ingestion DAGs code base.M9 - Release 0.12Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/82Manifest ingestion does not show any updates in airflow when backslash charac...2021-10-19T19:16:21ZNaufal Mohamed NooriManifest ingestion does not show any updates in airflow when backslash character used in json body**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps...**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps to reproduce:**
a) Insert the body json into DAG worklow body. [With_Backslash_BodyData.json](/uploads/c2f2e8e8241df526830a73cc9ba2336a/With_Backslash_BodyData.json)
b) When submit the body json into base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun the workflow is submitted succesfully with the following response:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
c) After a while, check DAG run status and the workflow still showing the run is in submitted status. And no trace of the run ID in the Airflow log (This follow up check was done after 24 hours):
_Endpoint_: base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun/4327f575-e7b3-490f-a1ee-b1e2e950c2a4
_Response_:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
d) When a second trial run was conducted by replacing \ char with empty char, the workflow run was running perfectly and shows trace of running in Airflow log. [With_NO_Backslash_BodyData.json](/uploads/9fdbc2a59a930444feeb6bfacd1e1200/With_NO_Backslash_BodyData.json)
**Expectation**:
We are expecting that the workflow run to failed our request with clear and meaningful error message i.e. Request is failed. There is non-allowed special characters in line #something to line #something in your json body.
**Reason**
It will be a confusion for users to have a run successfully submitted but stuck in the process without any log trace whatsoever.
cc @debasiscM9 - Release 0.12https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/66IBM support to move to airflow 2.02021-11-18T15:06:48Zjingdong sunIBM support to move to airflow 2.0M9 - Release 0.12Anuj GuptaShaonjingdong sunAnuj Gupta