Manifest Ingestion DAG merge requestshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests2023-08-18T11:14:57Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/19Fixed issue in environment variable2023-08-18T11:14:57ZKishore BattulaFixed issue in environment variable## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fixed issue in reading wrong environment variable.M4 - Release 0.7https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/44Ingestion updates2023-08-18T11:14:31ZYan Sushchynski (EPAM)Ingestion updates## Type of change
- [x] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
This MR comes with a few updates:
Features:
* Add report about skipped and processed ids to XComs (Issue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/35). (GONRG-1934)
Bugfixes:
* Fix the issue, when an integer part of ids was considered a version and this prevented WP manifest ingestion. (Isue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55). (GONRG-2144)
* Fix the issue, when references to ingested entities with real ids get extra ":" (e.g. `"Datasets": ["osdu:dataset--File.Generic:feb02::"]` instead of `"Datasets": [ "osdu:dataset--File.Generic:feb02:"]`) (Issue: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/142). (GONRG-2147)M5 - Release 0.8Siarhei Khaletski (EPAM)Rostislav Dublin (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/40Aws impl, id version bug fix2023-08-18T11:14:32ZSpencer Suttonsuttonsp@amazon.comAws impl, id version bug fix## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Aws implementation code.
Also bug fix that addresses this issue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55M5 - Release 0.8Spencer Suttonsuttonsp@amazon.comSpencer Suttonsuttonsp@amazon.comhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/37GONRG-2185: Single manifest validation hidden under the flag2023-08-18T11:14:34ZSiarhei Khaletski (EPAM)GONRG-2185: Single manifest validation hidden under the flag## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Single manifest validation hidden under the flagM5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/36GONRG-2008: Documentation has been updated2023-08-18T11:14:36ZSiarhei Khaletski (EPAM)GONRG-2008: Documentation has been updated## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Documentation
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Az...## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Documentation
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Documentation has been updatedM5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/34Added BaseTokenRefresher2023-08-18T11:14:37ZSiarhei Khaletski (EPAM)Added BaseTokenRefresher## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Added BaseTokenRefresherM5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/32Fix for FileSource; fix for pipelines with invalid manifest2023-08-18T11:14:39ZSiarhei Khaletski (EPAM)Fix for FileSource; fix for pipelines with invalid manifest## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [x] GCP
- [ ] IBM
## Updates description?
- Fix for issue with FileSource as a string with spaces
- Fix for with failing pipeline for generally invalid manifestM5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/30Refactor referential integrity (GONRG-1932)2023-08-18T11:14:40ZSiarhei Khaletski (EPAM)Refactor referential integrity (GONRG-1932)## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Refactoring
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azur...## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Refactoring
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Manifest integrity ensuring logic has been refactored.M5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/29Added readme for azure2023-08-18T11:14:42ZKishore BattulaAdded readme for azure## Type of change
- [] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] I...## Type of change
- [] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [X] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Updated documentation on how to deploy manifest ingestion DAGs into airflowM5 - Release 0.8https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/56GONRG-2320: Switch file handler API to v22023-08-18T11:14:12ZYan Sushchynski (EPAM)GONRG-2320: Switch file handler API to v2## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
As File service's API /v1 is deprecated, we need to change its version to /v2
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlM6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/54GONRG-2366: Cursor changed to offsets2023-08-18T11:14:15ZSiarhei Khaletski (EPAM)GONRG-2366: Cursor changed to offsets## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Due to the issues with Search with cursor it replaced to query with offsets (GONRG-2366)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/53GONRG-2017: Add manifest_ingestion test file to repo2023-08-18T11:14:17ZYan Sushchynski (EPAM)GONRG-2017: Add manifest_ingestion test file to repo## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Add manifest_ingestion DAG to test Workflow service. The DAG does nothing but return Ok status to Workflow service.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/52GONRG-2213: Fix different validation results2023-08-18T11:14:18ZYan Sushchynski (EPAM)GONRG-2213: Fix different validation results## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issues:
- Batch manifest ingestion rejected manifest with surrogate id.
- Single manifest ingestion was strict and didnt' allow to store parent records if their children are not present in Manifest, at the same time Batch ingestion allowed to store parents with no children and was less strict. Now both Batch and Single ingestion are not strict; the option of choosing between these two types of validation will be added later.M6 - Release 0.9Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/51fix/GONRG-2291: Skipped reason truncated2023-08-18T11:14:20ZYan Sushchynski (EPAM)fix/GONRG-2291: Skipped reason truncated## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Skipped entities info was truncated in XComs.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/50fix/GONRG-2103: Pass wrong date-time format2023-08-18T11:14:22ZYan Sushchynski (EPAM)fix/GONRG-2103: Pass wrong date-time format## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Any entity was able to pass the validation against date-time format (e.g. "mar 11" was valid, though it didn't follow date-time format).
!!!NOTE: To turn on date-time validation, the package `strict-rfc3339` must be installed (https://python-jsonschema.readthedocs.io/en/latest/validate/#jsonschema.FormatError).
Date-time fields must follow RFC3339 standards :
- `2020-12-16T11:46:20.163Z` - ok
- `2019-10-12T14:20:50.52+07:00` - ok
- `2020-12-16T11:46:20Z` - ok
- `2019-10-12T14:20:50.52` - not ok
see: https://pypi.org/project/strict-rfc3339/M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/49GONRG-2293: Fix Missing colon in parent srn with surrogate-key2023-08-18T11:14:24ZYan Sushchynski (EPAM)GONRG-2293: Fix Missing colon in parent srn with surrogate-key## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Surrogate-key references are replaced to OSDU-generated ids with no trailing colon.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/48GONRG-2292: Find references with no version in Search2023-08-18T11:14:25ZYan Sushchynski (EPAM)GONRG-2292: Find references with no version in Search## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- An attempt to ingest the entity fails, if it contains references with **no specific version** to already ingested data on OSDU .M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/47GONRG-2170: Added cursor for search service requests2023-08-18T11:14:27ZSiarhei Khaletski (EPAM)GONRG-2170: Added cursor for search service requests## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- The fix fixes an issue with limit of Search query requests (closes https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/35)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/46GONRG-2142: Add lru cache2023-08-18T11:14:29ZSiarhei Khaletski (EPAM)GONRG-2142: Add lru cache## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] ...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Added lru-cache for `get_schema` method.M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/62GONRG-2726: Move libs to SDK2021-07-20T15:27:50ZYan Sushchynski (EPAM)GONRG-2726: Move libs to SDK## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Move folders `libs` and `providers` to PythonSDK. The SDK must be installed via `pip` to Airflow's env.
Now, to access to code in these folders
`import osdu_api.libs`
`import osdu_api.providers`M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)