Manifest Ingestion DAG merge requestshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests2023-08-18T11:14:12Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/56GONRG-2320: Switch file handler API to v22023-08-18T11:14:12ZYan Sushchynski (EPAM)GONRG-2320: Switch file handler API to v2## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
As File service's API /v1 is deprecated, we need to change its version to /v2
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlM6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/55Preeti/ManifestIngestionPipeline ado2023-08-18T11:14:13Zpreeti singh[Microsoft]Preeti/ManifestIngestionPipeline ado## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pip...## Type of change
- Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
Azure
## Updates description?
This code change is for pipeline creation for deployment of Manifest ingestion DAGs.
It copies the required files(python file for DAgs and files in plugins and DAGs folder) to Airflow file share. Then it registers the DAG.M7 - Release 0.10preeti singh[Microsoft]preeti singh[Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/54GONRG-2366: Cursor changed to offsets2023-08-18T11:14:15ZSiarhei Khaletski (EPAM)GONRG-2366: Cursor changed to offsets## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Updates description?
Due to the issues with Search with cursor it replaced to query with offsets (GONRG-2366)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/53GONRG-2017: Add manifest_ingestion test file to repo2023-08-18T11:14:17ZYan Sushchynski (EPAM)GONRG-2017: Add manifest_ingestion test file to repo## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Add manifest_ingestion DAG to test Workflow service. The DAG does nothing but return Ok status to Workflow service.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/52GONRG-2213: Fix different validation results2023-08-18T11:14:18ZYan Sushchynski (EPAM)GONRG-2213: Fix different validation results## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issues:
- Batch manifest ingestion rejected manifest with surrogate id.
- Single manifest ingestion was strict and didnt' allow to store parent records if their children are not present in Manifest, at the same time Batch ingestion allowed to store parents with no children and was less strict. Now both Batch and Single ingestion are not strict; the option of choosing between these two types of validation will be added later.M6 - Release 0.9Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/51fix/GONRG-2291: Skipped reason truncated2023-08-18T11:14:20ZYan Sushchynski (EPAM)fix/GONRG-2291: Skipped reason truncated## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Skipped entities info was truncated in XComs.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/50fix/GONRG-2103: Pass wrong date-time format2023-08-18T11:14:22ZYan Sushchynski (EPAM)fix/GONRG-2103: Pass wrong date-time format## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Any entity was able to pass the validation against date-time format (e.g. "mar 11" was valid, though it didn't follow date-time format).
!!!NOTE: To turn on date-time validation, the package `strict-rfc3339` must be installed (https://python-jsonschema.readthedocs.io/en/latest/validate/#jsonschema.FormatError).
Date-time fields must follow RFC3339 standards :
- `2020-12-16T11:46:20.163Z` - ok
- `2019-10-12T14:20:50.52+07:00` - ok
- `2020-12-16T11:46:20Z` - ok
- `2019-10-12T14:20:50.52` - not ok
see: https://pypi.org/project/strict-rfc3339/M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/49GONRG-2293: Fix Missing colon in parent srn with surrogate-key2023-08-18T11:14:24ZYan Sushchynski (EPAM)GONRG-2293: Fix Missing colon in parent srn with surrogate-key## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- Surrogate-key references are replaced to OSDU-generated ids with no trailing colon.M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/48GONRG-2292: Find references with no version in Search2023-08-18T11:14:25ZYan Sushchynski (EPAM)GONRG-2292: Find references with no version in Search## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue:
- An attempt to ingest the entity fails, if it contains references with **no specific version** to already ingested data on OSDU .M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/47GONRG-2170: Added cursor for search service requests2023-08-18T11:14:27ZSiarhei Khaletski (EPAM)GONRG-2170: Added cursor for search service requests## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- The fix fixes an issue with limit of Search query requests (closes https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/35)M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/46GONRG-2142: Add lru cache2023-08-18T11:14:29ZSiarhei Khaletski (EPAM)GONRG-2142: Add lru cache## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] ...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Added lru-cache for `get_schema` method.M6 - Release 0.9Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/45Cherry-pick "Ingestion updates" into the release branch2021-05-15T00:56:16ZDavid Diederichd.diederich@opengroup.orgCherry-pick "Ingestion updates" into the release branchOriginal MR: osdu/platform/data-flow/ingestion/ingestion-dags!44
Also added some essential code from: osdu/platform/data-flow/ingestion/ingestion-dags!38Original MR: osdu/platform/data-flow/ingestion/ingestion-dags!44
Also added some essential code from: osdu/platform/data-flow/ingestion/ingestion-dags!38David Diederichd.diederich@opengroup.orgDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/44Ingestion updates2023-08-18T11:14:31ZYan Sushchynski (EPAM)Ingestion updates## Type of change
- [x] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
This MR comes with a few updates:
Features:
* Add report about skipped and processed ids to XComs (Issue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/35). (GONRG-1934)
Bugfixes:
* Fix the issue, when an integer part of ids was considered a version and this prevented WP manifest ingestion. (Isue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55). (GONRG-2144)
* Fix the issue, when references to ingested entities with real ids get extra ":" (e.g. `"Datasets": ["osdu:dataset--File.Generic:feb02::"]` instead of `"Datasets": [ "osdu:dataset--File.Generic:feb02:"]`) (Issue: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/142). (GONRG-2147)M5 - Release 0.8Siarhei Khaletski (EPAM)Rostislav Dublin (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/43WIP: Add report of skipped ids2021-04-19T15:14:58ZYan Sushchynski (EPAM)WIP: Add report of skipped ids## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [ ] Bug Fix
- [x] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Add report about skipped and processed ids to XComs (https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/35).Siarhei Khaletski (EPAM)Rostislav Dublin (EPAM)Michael Tarasov (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/42Bugfix/gonrg 2144 id integer part considered version2021-04-19T15:15:35ZYan Sushchynski (EPAM)Bugfix/gonrg 2144 id integer part considered version## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Fix the issue, when an integer part of ids was considered a version and this prevented WP manifest ingestion.
(https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55)Siarhei Khaletski (EPAM)Rostislav Dublin (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/41GONRG-2147: Fix double colon in refs2021-04-19T15:15:11ZYan Sushchynski (EPAM)GONRG-2147: Fix double colon in refs## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
These changes fix the issue, when references to ingested entities get extra ":"
(e.g. `"Datasets": ["osdu:dataset--File.Generic:feb02::"]` instead of `"Datasets": [ "osdu:dataset--File.Generic:feb02:"]`).Siarhei Khaletski (EPAM)Rostislav Dublin (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/40Aws impl, id version bug fix2023-08-18T11:14:32ZSpencer Suttonsuttonsp@amazon.comAws impl, id version bug fix## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
Aws implementation code.
Also bug fix that addresses this issue: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55M5 - Release 0.8Spencer Suttonsuttonsp@amazon.comSpencer Suttonsuttonsp@amazon.comhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/39removing with_validation boolean flag2021-07-05T11:53:01ZBrady Spiva [AWS]removing with_validation boolean flag## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ]...## Type of change
- [x] Bug Fix
- [ ] Feature
## Does this introduce a change in the core logic?
- [Yes]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
This addresses issue #54, removing redundant validation steps to speed up manifest parsing. The manifest parser DAG performs validations for referential integrity and schema conformity in separate operators, so there is no need to repeat the validations here in the manifest parsing stage of this DAG.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/38Bugfix/remove duplicate steps2021-04-22T19:44:21ZSiarhei Khaletski (EPAM)Bugfix/remove duplicate steps@divido merge please (already merged into `master`)@divido merge please (already merged into `master`)David Diederichd.diederich@opengroup.orgDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/merge_requests/36GONRG-2008: Documentation has been updated2023-08-18T11:14:36ZSiarhei Khaletski (EPAM)GONRG-2008: Documentation has been updated## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Documentation
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Az...## Type of change
- [ ] Bug Fix
- [ ] Feature
- [x] Documentation
## Does this introduce a change in the core logic?
- [No]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Updates description?
- Documentation has been updatedM5 - Release 0.8Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)