Ingestion Workflow merge requestshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests2023-08-18T11:19:56Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/79Changes for azure based on adr 742023-08-18T11:19:56ZVineeth Guna [Microsoft]Changes for azure based on adr 74M3 - Release 0.5https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/92Fixed issues in getworkflowRuns API2023-08-18T11:19:37ZKishore BattulaFixed issues in getworkflowRuns APIChecking existence of workflow when getting workflow runsChecking existence of workflow when getting workflow runsM4 - Release 0.7https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/90Get all workflows tenant2023-08-18T11:19:41ZMayank Saggar [Microsoft]Get all workflows tenantAdded a new workflow api with UTs and ITs.
Get All Workflows starting with *prefix* API with rest end point : */workflow/?prefix="test"*
1 Parameters can be specified for this get request namely: *prefix*.
Queries the database for all...Added a new workflow api with UTs and ITs.
Get All Workflows starting with *prefix* API with rest end point : */workflow/?prefix="test"*
1 Parameters can be specified for this get request namely: *prefix*.
Queries the database for all the workflows having the specified prefix. The resulting items are sorted by timestamp in descending order.
ITs for the api:
* Success cases when:
* prefix is provided
* Unauthorized case with no access token and no data access token.
* Unauthorized when given invalid partition idM4 - Release 0.7https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/89Fixed issues in delete workflow API2023-08-18T11:19:43ZKishore BattulaFixed issues in delete workflow APIAzure implementation has a flow to deploy the DAG through workflow service. When delete a workflow we should only delete the DAGs that are registered through workflow service. Sending a flag to decide whether to delete a DAG from airflow...Azure implementation has a flow to deploy the DAG through workflow service. When delete a workflow we should only delete the DAGs that are registered through workflow service. Sending a flag to decide whether to delete a DAG from airflow or not.
It has core changes but default to false for isDeployedThroughWorkflowService as other CSPs haven't implemented this feature.M4 - Release 0.7https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/85(GONRG-1608) GCP: Impl audit events2023-08-18T11:19:48ZIgor Filippov (EPAM)(GONRG-1608) GCP: Impl audit events## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [YES]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] I...## Type of change
- [ ] Bug Fix
- [X] Feature
## Does this introduce a change in the core logic?
- [YES]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [ ] AWS
- [ ] Azure
- [ ] GCP
- [ ] IBM
## Does this introduce a breaking change?
- [NO]
## What is the current behavior?
Audit events doesn't implement
## What is the new/expected behavior?
Audit events implements
## Have you added/updated Unit Tests and Integration Tests?
- [YES]M4 - Release 0.7Dmitriy RudkoRostislav Dublin (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/82IBM implementation2023-08-18T11:19:51ZBhushan RadeIBM implementationIBM Changes :
- Added IBM Implementation
Core changes :
- one change in **worflow-core** at /workflow-core/src/main/java/org/opengroup/osdu/workflow/service/**AirflowWorkflowEngineServiceImpl.java**
Issue link for core change - https:...IBM Changes :
- Added IBM Implementation
Core changes :
- one change in **worflow-core** at /workflow-core/src/main/java/org/opengroup/osdu/workflow/service/**AirflowWorkflowEngineServiceImpl.java**
Issue link for core change - https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/87M4 - Release 0.7Anuj GuptaAnuj Guptahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/111Getting 500 internal server error when try to retrieve to workflow status (GO...2023-08-18T11:19:08ZRiabokon Stanislav(EPAM)[GCP]Getting 500 internal server error when try to retrieve to workflow status (GONRG-2273)# Description:
GET request does not contain a body. Thus, we got 500 internal server error when try to retrieve to workflow status.
Original issue: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/1...# Description:
GET request does not contain a body. Thus, we got 500 internal server error when try to retrieve to workflow status.
Original issue: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/164
# How to test:
Trigger the workflow POST https://workflow-drgfbg5txq-uc.a.run.app/v1/workflow/Osdu_ingest/workflowRun
Retrieve the status of the workflow GET [https://workflow-drgfbg5txq-uc.a.run.app/v1/workflow/Osdu_ingest/workflowRun/{{runid}}](https://workflow-drgfbg5txq-uc.a.run.app/v1/workflow/Osdu_ingest/workflowRun/%7B%7Brunid%7D%7D)
# Changes include:
- [ ] Refactor (a non-breaking change that improves code maintainability).
- [ ] Bugfix (a non-breaking change that solves an issue).
- [ ] New feature (a non-breaking change that adds functionality).
- [x] Breaking change (a change that is not backward-compatible and/or changes current functionality).
# Changes in:
- [x] GCP
- [x] Azure
- [x] AWS
- [x] IBM
- [x] Common code
# Dev Checklist:
- [ ] Added Unit Tests, wherever applicable.
- [ ] Updated the Readme, if applicable.
- [ ] Existing Tests pass
- [x] Verified functionality locally
- [x] Self Reviewed my code for formatting and complex business logic.
# Other comments:M5 - Release 0.8Rostislav Dublin (EPAM)Rostislav Dublin (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/101Workflow service failing with Out Of Memory Error (GONRG-2100)2023-08-18T11:19:22ZArtem Dobrynin (EPAM)Workflow service failing with Out Of Memory Error (GONRG-2100)# Description:
Fixed Out of Memory error while loading batch files with big amount of data.
See discussion here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64
# How to test:
Ingestion Workflo...# Description:
Fixed Out of Memory error while loading batch files with big amount of data.
See discussion here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64
# How to test:
Ingestion Workflow service can be tested with the help of business cases.
# Changes include:
- [ ] Refactor (a non-breaking change that improves code maintainability).
- [x] Bugfix (a non-breaking change that solves an issue).
- [ ] New feature (a non-breaking change that adds functionality).
- [ ] Breaking change (a change that is not backward-compatible and/or changes current functionality).
# Changes in:
- [x] Core
- [x] GCP
- [ ] Azure
- [ ] AWS
- [ ] IBM
# Dev Checklist:
* [ ] Added Unit Tests, wherever applicable.
* [ ] Updated the Readme, if applicable.
* [x] Existing Tests pass
* [x] Verified functionality locally
* [x] Self Reviewed my code for formatting and complex business logic.M5 - Release 0.8Rostislav Dublin (EPAM)Rostislav Dublin (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/100Workflow Service: Can’t trigger workflow with ‘dagName’ property (GONRG-2056)2023-08-18T11:19:23ZAnastasiia GelmutWorkflow Service: Can’t trigger workflow with ‘dagName’ property (GONRG-2056)# Description:
Fixed triggering workflow with 'dagName' property.
# How to test:
Ingestion Workflow service can be tested with the help of business cases.
# Changes include:
- [ ] Refactor (a non-breaking change that improves code main...# Description:
Fixed triggering workflow with 'dagName' property.
# How to test:
Ingestion Workflow service can be tested with the help of business cases.
# Changes include:
- [ ] Refactor (a non-breaking change that improves code maintainability).
- [x] Bugfix (a non-breaking change that solves an issue).
- [ ] New feature (a non-breaking change that adds functionality).
- [ ] Breaking change (a change that is not backward-compatible and/or changes current functionality).
# Changes in:
- [x] Core
- [x] GCP
- [ ] Azure
- [ ] AWS
- [ ] IBM
# Dev Checklist:
* [x] Added Unit Tests, wherever applicable.
* [ ] Updated the Readme, if applicable.
* [x] Existing Tests pass
* [x] Verified functionality locally
* [x] Self Reviewed my code for formatting and complex business logic.M5 - Release 0.8Rostislav Dublin (EPAM)Rostislav Dublin (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/107[Core] Updated Delete API to check workflow run status2023-08-18T11:19:13ZMayank Saggar [Microsoft][Core] Updated Delete API to check workflow run statusUpdated Delete api to check workflow run status before sending error when active workflow runs present.Updated Delete api to check workflow run status before sending error when active workflow runs present.M6 - Release 0.9Mayank Saggar [Microsoft]Mayank Saggar [Microsoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/106[Core] Added response filter in workflow service2023-08-18T11:19:14ZAalekh Jain[Core] Added response filter in workflow serviceOriginal issue: #99Original issue: #99M6 - Release 0.9https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/105[Core] POST /workflow - Invalid workflow name - Bad Request2023-08-18T11:19:16ZAalekh Jain[Core] POST /workflow - Invalid workflow name - Bad RequestOriginal issue: #102Original issue: #102M6 - Release 0.9Aalekh JainAalekh Jainhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/103[Core] Updated Authorization filter for validating mandatory headers2023-08-18T11:19:18ZAalekh Jain[Core] Updated Authorization filter for validating mandatory headersRefer issue #113Refer issue #113M6 - Release 0.9Aalekh JainAalekh Jainhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/136move workflow name validation to core code2023-08-18T11:18:46ZYunhua Koglinmove workflow name validation to core codemove workflow name validation to core codemove workflow name validation to core codeM7 - Release 0.10Yunhua KoglinYunhua Koglinhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/124Fixed issue in swagger input parameter by upgrading Swagger 2 to Swagger 32021-06-22T12:28:26ZAbhiman NeelakanteswaraFixed issue in swagger input parameter by upgrading Swagger 2 to Swagger 3## Motivation
A bug was raised mentioned that the input parameters in the `params` field for the GET /{workflow_name}/workflowRun API were not accepted. The API did not work from the swagger page but however, worked with Postman.
## C...## Motivation
A bug was raised mentioned that the input parameters in the `params` field for the GET /{workflow_name}/workflowRun API were not accepted. The API did not work from the swagger page but however, worked with Postman.
## Changes
The issue was Swagger 2 was unable to serialize parameters from JSON input to a GET API. This feature is only supported in Open API Spec(OAS)>3.0. Hence, the springfox-swagger version has been updated to 3.0.0
## Known Issues
Recommended library for OAS 3.0 is springdoc-openapi-ui. However, the latest version of this library has an inline script in the index.html of the OAS page. This is currently not allowed according to the security policy in our azure deployment.M7 - Release 0.10Abhiman NeelakanteswaraAbhiman Neelakanteswarahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/119[737153:] fix whitesource vulnerabilities2021-07-01T15:07:19ZMaksim Malkov[737153:] fix whitesource vulnerabilities## Motivation
Recently we've conducted a WhiteSource vulnerabilities report review for **non-opersource** version of the Workflow service.
It was suggested to make the same changes for the **open-sourced** version of the service.
So that...## Motivation
Recently we've conducted a WhiteSource vulnerabilities report review for **non-opersource** version of the Workflow service.
It was suggested to make the same changes for the **open-sourced** version of the service.
So that's how this PR was created.
## Changes
Updated dependencies for `Core` and `Azure` modules.
## Issue raised
[issue_120](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/120)
## WhiteSource Reports
You can find report's attached below to this PR.<br>
[workflow-azure-vulnerability-report.xlsx](/uploads/8491838b06a3549876e0ab3da2e399c4/workflow-azure-vulnerability-report.xlsx)
<br>
[workflow-core-vulnerability-report.xlsx](/uploads/752245ca0fd027b014adda1bce155db5/workflow-core-vulnerability-report.xlsx)M7 - Release 0.10Maksim MalkovMaksim Malkovhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/151Return deleted plugins(version info)2021-09-03T02:26:51ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comReturn deleted plugins(version info)M8 - Release 0.11Rostislav Dublin (EPAM)Rostislav Dublin (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/144Add GSM2021-08-30T18:28:57ZMaksim MalkovAdd GSM[issue#126](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/126)
## Contents
* bugfixes
* GSM integration
### GSM implementation details
In the current iteration, we implemented both servic...[issue#126](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/126)
## Contents
* bugfixes
* GSM integration
### GSM implementation details
In the current iteration, we implemented both services for publishing `STATUS` types of messages. For actual use, we have integrated only the status part according to our architectural inputs.
After this MR will be merged we can expect the following messages will be sent(or at least has an attempt of in case of resource absence):
- Once a user called `triggerWorkflow` service will send GSM with `SUBMITTED` status
- Once a user called `updateWorkflowRun` service will send GSM with `<Any of logically correct and supported>` status retrieved from update payload (we'll send a message only if status was changed)
Currently, we are providing all provide with default dull implementation of Message Sender(`IEventPublisher`) it can be easily overridden as we did for Azure provider.
## About GSM
### Introduction <a name="introduction"></a>
Global Status monitoring is a mechanism to track the status of data journey/dataflows on the data platform.
The infrastructure would help in tracking the status of file/data/record ingested through File Service/ Storage API/ Specific DOMS until it is consumed by dependent services.
Every stage publishes one status message to the message queue. From there `Status Collector` picks up messages and normalizes them to store them in persistent storage for future reference. Then `Status Processor` provides an API to query and check the status of past datasets.
### Status Data Model <a name="dataModel"></a>
Data Model properties help any user to search for status with multiple or specific properties. Every request will be tracked through specific `dataSetId` & its associated `correlationId`.
Status Data Model is being distributed across multiple tables for tracking whether dataflow has finished or not and if it is Successful or Failed.
One table holds DataSet Details and another table holds the overall Status of that dataflow journey.
* **DataSet Details** - Dataset can be anything that contains data, for e.g., File is one of the types of datasets which contains data inside, File Collection could be another dataset that would contain a set of files.
* **Status** - hold the overall status of that data flow. `correlationId` is used as a unique id to capture a single request going through different stages of our Data Platform.
### How to publish status and dataset details events <a name="howToPublishStatusEvents"></a>
Any service which wants to publish status and dataset details have to follow the below steps:
1. **Add `core common lib` as dependency** - There are models, classes, and interface defined in `core common lib` from Azure.
We have to make sure we have selected the right version of the library which includes these classes.
2. **All possible scenarios to publish Status/Dataset Details** - It is advised to find out all possible scenarios in which either Status or Dataset Details can be published.
A service can publish multiple sets of both Status and Dataset Details.
3. **Cloud Implementation to publish Status/Dataset Details** - You need to provide an implementation of `IEventPublisher` interface from core common lib.
Publish method in this interface accepts an array of Messages and Maps of string attributes. The message is an interface implemented by both Status and Dataset Details.
So this method expects an array of either Status or Dataset Details. This method of `IEventPublisher` has to be implemented with cloud-specific codes to publish events in `statuschangedtopic`.
Note: We have Azure implementation of Global Status Monitoring. Services that are not part of OSDU AKS cluster have to use /status and /datasetDetails endpoints of `Status Processor` service. `Status Processor` service will publish status and dataset details in `statuschangedtopic`.
#### Sample of status and dataset details message <a name="sample"></a>
The status messages are one of two kinds - DataSet Details and Status, but they are published into the same `statuschangedtopic`.
* **DataSet Details**
```json
[
{
"kind": "datasetDetails",
"properties": {
"correlationId": "12345",
"datasetId": "12345",
"datasetVersionId": "1",
"datasetType": "FILE",
"recordCount": 10,
"timestamp": 1625221800
}
}
]
```
* **Status**
```json
[
{
"kind": "status",
"properties": {
"correlationId": "12345",
"recordId": "12334",
"recordIdVersion": "123ff",
"stage": "STORAGE_SYNC",
"status": "FAILED",
"message": "acl is not valid",
"errorCode": 400,
"userEmail": "test@email.com",
"timestamp": 1625221800
}
}
]
```
#### Core Common Library contents for GSM <a name="coreCommonLibContents"></a>
1. **Models** - `StatusDetails` and `DatasetDetails` - These 2 models should be used to publish status and dataset details.
2. **Utility** - `AttributesBuilder` - This will help to create an attributes map which is required in publishing method of `IEventPublisher` to publish status or dataset details. Attributes map will consist of `data partition id` and `correlation id`.
3. **Publisher Interface** - `IEventPublisher` - This is the interface that a cloud provider has to implement to produce status and dataset details. It contains a method that accepts the Message array and Attributes maps. The message is an interface implemented by both Status and Dataset Details.
### Supported Stages and Statuses <a name="stagesAndStatuses"></a>
#### Stages and Services Mapping
| Stage | Service |
|-------|---------|
| DATASET_SYNC | File Service, Dataset |
| INGESTOR | All Ingestors for e.g., CSV, LAS/DLIS/Document |
| INGESTOR_SYNC | All Ingestors for e.g., CSV, LAS/DLIS/Document |
| WKS_SYNC | All those services that create WKS source records in the Data Platform, for e.g., WKS Transformation Service |
| WKE_SYNC | WKE Service |
| STORAGE_SYNC | Storage Service |
| ES_SYNC | Indexer Service |
#### Supported Statuses
| Status |
|--------|
| SUBMITTED |
| SUCCESS |
| FAILED |
| IN_PROGRESS |
| SKIPPED |
| PARTIAL_SUCCESS |M8 - Release 0.11https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/140Added service version endpoint (GONRG-2887)2021-08-26T11:18:35ZAnastasiia GelmutAdded service version endpoint (GONRG-2887)## Type of change
- [ ] Bug Fix
- [x] Feature
osdu/platform/system/lib/core/os-core-common#47
## Does this introduce a change in the core logic?
- [YES]
## Does this introduce a change in the cloud provider implementation, if so whic...## Type of change
- [ ] Bug Fix
- [x] Feature
osdu/platform/system/lib/core/os-core-common#47
## Does this introduce a change in the core logic?
- [YES]
## Does this introduce a change in the cloud provider implementation, if so which cloud?
- [x] AWS
- [x] Azure
- [x] GCP
- [x] IBM
## Does this introduce a breaking change?
- [YES]
## What is the current behavior?
Provides info about maven build and gitM8 - Release 0.11Riabokon Stanislav(EPAM)[GCP]Rostislav Dublin (EPAM)Riabokon Stanislav(EPAM)[GCP]https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/131[Core] Adding integration tests to core2021-08-09T09:35:20ZAalekh Jain[Core] Adding integration tests to coreM8 - Release 0.11Aalekh JainAalekh Jain