Manifest Ingestion DAG merge requestshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests2023-08-18T11:14:02Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/65Azure - Updating ADO pipeline2023-08-18T11:14:02Zharshit aggarwalAzure - Updating ADO pipelineM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/64GONRG-2913: Added support for whitelist reference patterns to exclude from...2022-12-01T11:24:31ZAleksandr Spivakov (EPAM)GONRG-2913: Added support for whitelist reference patterns to exclude from...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Introduces availability to whitelist references using custom regexp patterns to exclude those references from referential integrity validation.
New Airflow variable example:
![image](/uploads/771f4da7bde79cfef6d228777ff5dd7d/image.png)
See related MR for more details:
https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/merge_requests/22
closes https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/external-data-framework/-/issues/180M8 - Release 0.11Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/63Azure - Adding Gitlab pipelines2023-08-18T11:14:04Zharshit aggarwalAzure - Adding Gitlab pipelines## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [NO]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] ...## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [NO]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Adding Azure pipeline, end_to_end_test_dag stage will be implemented in a subsequent MRM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/62GONRG-2726: Move libs to SDK2021-07-20T15:27:50ZYan Sushchynski (EPAM)GONRG-2726: Move libs to SDK## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Move folders `libs` and `providers` to PythonSDK. The SDK must be installed via `pip` to Airflow's env.
Now, to access to code in these folders
`import osdu_api.libs`
`import osdu_api.providers`M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/61GONRG-2591: Airflow 2 backward compatibility2023-08-18T11:14:05ZYan Sushchynski (EPAM)GONRG-2591: Airflow 2 backward compatibility## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
These changes introduces possibility to use the same Manifest Based Ingestion code base both for Airflow >=1.10.10 and Airflow 2M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/60Included EDS Dataset pattern check2023-08-18T11:14:07ZRajesh BollineniIncluded EDS Dataset pattern check## Type of change
- [] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] ...## Type of change
- [] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Closes #75M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/59GONRG-2537: Add cache for Search_records_ids in ManifestIntegrityChecker2023-08-18T11:14:09ZYan Sushchynski (EPAM)GONRG-2537: Add cache for Search_records_ids in ManifestIntegrityChecker## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
This MR introduces Search cache for external references in the Manifest's entities in Ensure Manifest Integrity step.
If a reference appeared in the search queries once, it won't appear in the following request once again.M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/58GONRG-2327: Added possibility to specify custom SA-file for credentials logic2021-06-24T13:33:57ZSiarhei Khaletski (EPAM)GONRG-2327: Added possibility to specify custom SA-file for credentials logic## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ] IBM
## Updates description?
Added possibility to specify custom SA file path for GCP credentials.M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/57Preeti/pipeline ado unit test2023-08-18T11:14:10Zpreeti singh[Microsoft]Preeti/pipeline ado unit test## Type of change
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
No
## Updates description?
Describe your code changes...## Type of change
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
No
## Updates description?
Describe your code changes in details for reviewers (links on Gitlab issues, etc.)
Adding a task for running Unit test for ADO pipelines for this repoM7 - Release 0.10preeti singh[Microsoft]preeti singh[Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/55Preeti/ManifestIngestionPipeline ado2023-08-18T11:14:13Zpreeti singh[Microsoft]Preeti/ManifestIngestionPipeline ado## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pip...## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pipeline creation for deployment of Manifest ingestion DAGs.
It copies the required files(python file for DAgs and files in plugins and DAGs folder) to Airflow file share. Then it registers the DAG.M7 - Release 0.10preeti singh[Microsoft]preeti singh[Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/56GONRG-2320: Switch file handler API to v22023-08-18T11:14:12ZYan Sushchynski (EPAM)GONRG-2320: Switch file handler API to v2## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
As File service's API /v1 is deprecated, we need to change its version to /v2
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlM6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/54GONRG-2366: Cursor changed to offsets2023-08-18T11:14:15ZSiarhei Khaletski (EPAM)GONRG-2366: Cursor changed to offsets## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Due to the issues with Search with cursor it replaced to query with offsets (GONRG-2366)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/53GONRG-2017: Add manifest_ingestion test file to repo2023-08-18T11:14:17ZYan Sushchynski (EPAM)GONRG-2017: Add manifest_ingestion test file to repo## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Add manifest_ingestion DAG to test Workflow service. The DAG does nothing but return Ok status to Workflow service.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/52GONRG-2213: Fix different validation results2023-08-18T11:14:18ZYan Sushchynski (EPAM)GONRG-2213: Fix different validation results## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issues:
- Batch manifest ingestion rejected manifest with surrogate id.
- Single manifest ingestion was strict and didnt' allow to store parent records if their children are not present in Manifest, at the same time Batch ingestion allowed to store parents with no children and was less strict. Now both Batch and Single ingestion are not strict; the option of choosing between these two types of validation will be added later.M6 - Release 0.9Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/51fix/GONRG-2291: Skipped reason truncated2023-08-18T11:14:20ZYan Sushchynski (EPAM)fix/GONRG-2291: Skipped reason truncated## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Skipped entities info was truncated in XComs.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/50fix/GONRG-2103: Pass wrong date-time format2023-08-18T11:14:22ZYan Sushchynski (EPAM)fix/GONRG-2103: Pass wrong date-time format## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Any entity was able to pass the validation against date-time format (e.g. "mar 11" was valid, though it didn't follow date-time format).
!!!NOTE: To turn on date-time validation, the package `strict-rfc3339` must be installed (https://python-jsonschema.readthedocs.io/en/latest/validate/#jsonschema.FormatError).
Date-time fields must follow RFC3339 standards :
- `2020-12-16T11:46:20.163Z` - ok
- `2019-10-12T14:20:50.52+07:00` - ok
- `2020-12-16T11:46:20Z` - ok
- `2019-10-12T14:20:50.52` - not ok
see: https://pypi.org/project/strict-rfc3339/M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/49GONRG-2293: Fix Missing colon in parent srn with surrogate-key2023-08-18T11:14:24ZYan Sushchynski (EPAM)GONRG-2293: Fix Missing colon in parent srn with surrogate-key## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Surrogate-key references are replaced to OSDU-generated ids with no trailing colon.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/48GONRG-2292: Find references with no version in Search2023-08-18T11:14:25ZYan Sushchynski (EPAM)GONRG-2292: Find references with no version in Search## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- An attempt to ingest the entity fails, if it contains references with **no specific version** to already ingested data on OSDU .M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/47GONRG-2170: Added cursor for search service requests2023-08-18T11:14:27ZSiarhei Khaletski (EPAM)GONRG-2170: Added cursor for search service requests## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- The fix fixes an issue with limit of Search query requests (closes https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/35)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/46GONRG-2142: Add lru cache2023-08-18T11:14:29ZSiarhei Khaletski (EPAM)GONRG-2142: Add lru cache## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] ...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Added lru-cache for `get_schema` method.M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)