Manifest Ingestion DAG issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues2024-03-20T15:49:55Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/105Performance testing in R3 M11 - Need to determine the maximum size of the pa...2024-03-20T15:49:55ZKamlesh TodaiPerformance testing in R3 M11 - Need to determine the maximum size of the payload allowed during ingestion using Osdu_ingest DAGAll,
For R3M11, I performed the performance load testing using the Osdu_ingest DAG running in Airflow v2.0
The environment I used was IBM Pre-ship R3 M11. Here is the summary:
As expected, we can see that when batch_upload is used ...All,
For R3M11, I performed the performance load testing using the Osdu_ingest DAG running in Airflow v2.0
The environment I used was IBM Pre-ship R3 M11. Here is the summary:
As expected, we can see that when batch_upload is used the time required to ingest the data goes down ( performance gain)
Some observations of the process used.
There is a difference in the python scripts that are used to generate the payload for Ingestion and batch_upload.
The python script that generates the payload for ingestion, generates records of kind: “opendes:wks:master-data--Organisation:1.0.0”. So when user specifies 5 records, it generates 5 records for kind Organisation
The python script that generates the payload for batch_upload, generates records of "osdu:wks:master-data--Organisation:1.0.0 and osdu:wks:reference-data--ContractorType:1.0.0". So when a user specifies 5 records, it generates records of kind
Organisation and ContractorType. So it is actually generating twice the amount of records that were specified.
At present to establish the benchmark for the performance, we are using the number of records. Probably because it is convenient to tell the users that to ingest a certain number of wells for example takes x amount of time.
But the well record size may vary from one user environment to another and hence performance numbers derived using a number of records may not hold true in all the situations.
How much one can ingest in one job or one time is based on the size of the payload in KB. So I think we should use the payload size in KB to establish the benchmark. The number of records that can fit in the payload would depend on the size of the records
I have done the testing in the IBM environment, but the test for 50000 records in batch_upload seems to be failing in all the environments.
I do not know, where the size limit is coming from? REST API, Network, Airflow, DAG implementation.
Nor do I know whether that size is configurable?
It is important for us to understand where that limitation is coming from and whether it is a hard limit or a configurable limit.
The python script file should honor that limit and generate data/payload files (multiple) containing a correct number of records to avoid failures.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/81While using manifest_ingestion (Osdu_ingest), the tags field for the Wellbore...2022-08-28T18:47:20ZKamlesh TodaiWhile using manifest_ingestion (Osdu_ingest), the tags field for the Wellbore data is populated in the payload, it appears that the tags field is not getting ingestedThe issue is that when the tags field for the Wellbore data is populated in payload while ingesting the wellbore data, it appears that the tags field is not getting ingested. There are no warnings or errors in the Airflow logs regarding ...The issue is that when the tags field for the Wellbore data is populated in payload while ingesting the wellbore data, it appears that the tags field is not getting ingested. There are no warnings or errors in the Airflow logs regarding this and the wellbore data is getting ingested, but the tags field is missing.
Note: When tried to insert Wellbore data with tags field populated, using the storage API, it works fine.
The details are attached in the word docs.
tagsFieldIngestIssue.docx contains the payload used during ingestion and the queries done to check the tags field data.
tagsFieldStorageSearch.docx contains the payload used while creating the wellbore record with tags field using the storage API
The test was done on two platforms (AWS and GCP)
The DAG/run details for the GCP
{
"workflowId": "ef82cba0-0e45-4df3-91bf-4df1553102d3",
"runId": "22821aa9-82a2-4910-9e3f-d1e27addb49d",
"startTimeStamp": 1627328994856,
"endTimeStamp": 1627329575098,
"status": "finished",
"submittedBy": "kamlesh_todai@osdu-gcp.go3-nrg.projects.epam.com"
}
The DAG/run details for the AWS runId: 57a9adc0-aabb-4bb9-8154-561b5c12412f
Have not tried on IBM and Azure to see whether the behavior is the same or different.
[tagsFieldIngestIssue.docx](/uploads/83fbd805ec66927bb850abc683ef076b/tagsFieldIngestIssue.docx)
[tagsFieldStorageSearch.docx](/uploads/9a494e21f5d55ea9a9d337974b1eb6f7/tagsFieldStorageSearch.docx)
@ChrisZhang @ethiraj @debasisc @Wibben @Kateryna_Kurach @anujgupta @manishkKamlesh TodaiKamlesh Todaihttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/94Integration E2E Tests for manifest ingestion - AWS2022-08-24T14:48:52ZChris ZhangIntegration E2E Tests for manifest ingestion - AWSThis is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M10 - Release 0.13Gustavo UrdanetaGustavo Urdanetahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/82Manifest ingestion does not show any updates in airflow when backslash charac...2021-10-19T19:16:21ZNaufal Mohamed NooriManifest ingestion does not show any updates in airflow when backslash character used in json body**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps...**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps to reproduce:**
a) Insert the body json into DAG worklow body. [With_Backslash_BodyData.json](/uploads/c2f2e8e8241df526830a73cc9ba2336a/With_Backslash_BodyData.json)
b) When submit the body json into base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun the workflow is submitted succesfully with the following response:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
c) After a while, check DAG run status and the workflow still showing the run is in submitted status. And no trace of the run ID in the Airflow log (This follow up check was done after 24 hours):
_Endpoint_: base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun/4327f575-e7b3-490f-a1ee-b1e2e950c2a4
_Response_:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
d) When a second trial run was conducted by replacing \ char with empty char, the workflow run was running perfectly and shows trace of running in Airflow log. [With_NO_Backslash_BodyData.json](/uploads/9fdbc2a59a930444feeb6bfacd1e1200/With_NO_Backslash_BodyData.json)
**Expectation**:
We are expecting that the workflow run to failed our request with clear and meaningful error message i.e. Request is failed. There is non-allowed special characters in line #something to line #something in your json body.
**Reason**
It will be a confusion for users to have a run successfully submitted but stuck in the process without any log trace whatsoever.
cc @debasiscM9 - Release 0.12https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/30Move ingestion DAGs and operators under a folder named osdu2021-06-15T04:06:21ZKishore BattulaMove ingestion DAGs and operators under a folder named osduCurrently the DAGs and operators are in top level folder `src`. Clients deploying these DAGs and operators will copy the DAGs into DAGs folder and operators into operators folders.
In a customer environment there will be more DAGs and ...Currently the DAGs and operators are in top level folder `src`. Clients deploying these DAGs and operators will copy the DAGs into DAGs folder and operators into operators folders.
In a customer environment there will be more DAGs and operators and there are chances where the python names can conflict with the existing names mentioned in this repository.
Can we move the DAGs, operators and hooks into osdu folder or any other folder name so that it will be easy manage. This has to be done in the repository only because the dags use import statements for operators and libs which will fail if someone wanted to put these under different folder structure.
**Benifits**
- Avoids naming conflict
- Easy to propagate updates from this repository into airflow. We can replace the entire folder in the destinationM1 - Release 0.1https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/52Determine if Airflow logs are piped to cloud logs2021-04-20T00:53:22ZAlan HensonDetermine if Airflow logs are piped to cloud logsIdeally, logs generated by Airflow are piped to the underlying cloud service provider's (CSP) logging framework. Once there, these logs are accessible via the CSP's respective consoles.
This issue is meant to validate which CSPs have im...Ideally, logs generated by Airflow are piped to the underlying cloud service provider's (CSP) logging framework. Once there, these logs are accessible via the CSP's respective consoles.
This issue is meant to validate which CSPs have implemented this capability:
- [ ] AWS
- [X] Azure
- [X] GCP
- [X] IBMhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/28Deploy Manifest Ingestion2021-03-23T11:28:13ZAlan HensonDeploy Manifest Ingestion