Manifest Ingestion DAG merge requestshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests2023-08-18T11:14:18Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/52GONRG-2213: Fix different validation results2023-08-18T11:14:18ZYan Sushchynski (EPAM)GONRG-2213: Fix different validation results## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issues:
- Batch manifest ingestion rejected manifest with surrogate id.
- Single manifest ingestion was strict and didnt' allow to store parent records if their children are not present in Manifest, at the same time Batch ingestion allowed to store parents with no children and was less strict. Now both Batch and Single ingestion are not strict; the option of choosing between these two types of validation will be added later.M6 - Release 0.9Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/54GONRG-2366: Cursor changed to offsets2023-08-18T11:14:15ZSiarhei Khaletski (EPAM)GONRG-2366: Cursor changed to offsets## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Due to the issues with Search with cursor it replaced to query with offsets (GONRG-2366)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/56GONRG-2320: Switch file handler API to v22023-08-18T11:14:12ZYan Sushchynski (EPAM)GONRG-2320: Switch file handler API to v2## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
As File service's API /v1 is deprecated, we need to change its version to /v2
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlM6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/55Preeti/ManifestIngestionPipeline ado2023-08-18T11:14:13Zpreeti singh[Microsoft]Preeti/ManifestIngestionPipeline ado## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pip...## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pipeline creation for deployment of Manifest ingestion DAGs.
It copies the required files(python file for DAgs and files in plugins and DAGs folder) to Airflow file share. Then it registers the DAG.M7 - Release 0.10preeti singh[Microsoft]preeti singh[Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/57Preeti/pipeline ado unit test2023-08-18T11:14:10Zpreeti singh[Microsoft]Preeti/pipeline ado unit test## Type of change
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
No
## Updates description?
Describe your code changes...## Type of change
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
No
## Updates description?
Describe your code changes in details for reviewers (links on Gitlab issues, etc.)
Adding a task for running Unit test for ADO pipelines for this repoM7 - Release 0.10preeti singh[Microsoft]preeti singh[Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/58GONRG-2327: Added possibility to specify custom SA-file for credentials logic2021-06-24T13:33:57ZSiarhei Khaletski (EPAM)GONRG-2327: Added possibility to specify custom SA-file for credentials logic## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ] IBM
## Updates description?
Added possibility to specify custom SA file path for GCP credentials.M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/59GONRG-2537: Add cache for Search_records_ids in ManifestIntegrityChecker2023-08-18T11:14:09ZYan Sushchynski (EPAM)GONRG-2537: Add cache for Search_records_ids in ManifestIntegrityChecker## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
This MR introduces Search cache for external references in the Manifest's entities in Ensure Manifest Integrity step.
If a reference appeared in the search queries once, it won't appear in the following request once again.M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/61GONRG-2591: Airflow 2 backward compatibility2023-08-18T11:14:05ZYan Sushchynski (EPAM)GONRG-2591: Airflow 2 backward compatibility## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
These changes introduces possibility to use the same Manifest Based Ingestion code base both for Airflow >=1.10.10 and Airflow 2M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/60Included EDS Dataset pattern check2023-08-18T11:14:07ZRajesh BollineniIncluded EDS Dataset pattern check## Type of change
- [] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] ...## Type of change
- [] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Closes #75M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/62GONRG-2726: Move libs to SDK2021-07-20T15:27:50ZYan Sushchynski (EPAM)GONRG-2726: Move libs to SDK## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Move folders `libs` and `providers` to PythonSDK. The SDK must be installed via `pip` to Airflow's env.
Now, to access to code in these folders
`import osdu_api.libs`
`import osdu_api.providers`M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/63Azure - Adding Gitlab pipelines2023-08-18T11:14:04Zharshit aggarwalAzure - Adding Gitlab pipelines## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [NO]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] ...## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [NO]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Adding Azure pipeline, end_to_end_test_dag stage will be implemented in a subsequent MRM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/65Azure - Updating ADO pipeline2023-08-18T11:14:02Zharshit aggarwalAzure - Updating ADO pipelineM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/68Azure - Adding E2E Test Stage2023-08-18T11:14:01Zharshit aggarwalAzure - Adding E2E Test Stage## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Describe your code changes in details for reviewers (links on Gitlab issues, etc.)M8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/69Azure - Add Test Stage ADO pipeline2023-08-18T11:13:59Zharshit aggarwalAzure - Add Test Stage ADO pipelineM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/67Enable Support for Packaged DAGs2021-08-26T11:42:24Zharshit aggarwalEnable Support for Packaged DAGs## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
The MR here is making changes to support packaged DAGs for manifest, the [ADR](https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/47) for this change has been approved
**New folder structure**
```
├── osdu_manifest
│ ├── __init__.py
│ ├── libs
│ │ ├── __init__.py
│ │ └── utils.py
│ └── operators
│ | ├── __init__.py
│ | └── customOperator1.py
| |___ hooks
| | |__ __init__.py
| |
| |___ configs
| |__ __init__.py
|
|___ osdu-ingest-r3.py
```
The changes include in the MR include
- Restructuring the folders
- Fixing any import statements
- Minor changes to run existing tests
**Related Issue - https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/86**
**ADR - https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/47**M8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/64GONRG-2913: Added support for whitelist reference patterns to exclude from...2022-12-01T11:24:31ZAleksandr Spivakov (EPAM)GONRG-2913: Added support for whitelist reference patterns to exclude from...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Introduces availability to whitelist references using custom regexp patterns to exclude those references from referential integrity validation.
New Airflow variable example:
![image](/uploads/771f4da7bde79cfef6d228777ff5dd7d/image.png)
See related MR for more details:
https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/merge_requests/22
closes https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/external-data-framework/-/issues/180M8 - Release 0.11Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/66GONRG-2696: Manifest integrity batch search2021-08-23T15:57:34ZYan Sushchynski (EPAM)GONRG-2696: Manifest integrity batch search## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ] IBM
## Updates description?
Add possibility to get the list of already skipped entities from previous tasks to use them in Manifest Integrity Check.M8 - Release 0.11Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/71Azure - pipeline script fix2023-08-18T11:13:55Zharshit aggarwalAzure - pipeline script fix## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Describe your code changes in details for reviewers (links on Gitlab issues, etc.)M8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/70Ingestion DAG pipeline updated to deploy for both Airflow 1.10.* & 2.0 enviro...2023-08-18T11:13:57ZArmen Gasparyan (EPAM)Ingestion DAG pipeline updated to deploy for both Airflow 1.10.* & 2.0 environments [GONRG-2869]Added syncs for DAGs and plugins for Composer Airflow V2.1.1 in GCP for Community and Preshiping environments.Added syncs for DAGs and plugins for Composer Airflow V2.1.1 in GCP for Community and Preshiping environments.M8 - Release 0.11Oleksandr Kosse (EPAM)Oleksandr Kosse (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/73Azure - Update Test Dag stage2023-08-18T11:13:54Zharshit aggarwalAzure - Update Test Dag stage## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Describe your code changes in details for reviewers (links on Gitlab issues, etc.)M9 - Release 0.12harshit aggarwalharshit aggarwal