Data Ingestion issueshttps://community.opengroup.org/groups/osdu/platform/data-flow/ingestion/-/issues2020-11-18T00:46:17Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/41Support for Workflow Roles - Currently Leveraging Storage Roles2020-11-18T00:46:17ZMatt WiseSupport for Workflow Roles - Currently Leveraging Storage Roles## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
The ingestion-workflow service currently is using common model StorageRole for authorization.
In code the following is observed:
...## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
The ingestion-workflow service currently is using common model StorageRole for authorization.
In code the following is observed:
```java
import org.opengroup.osdu.core.common.model.storage.StorageRole;
...
@PreAuthorize("@authorizationFilter.hasPermission('" + StorageRole.CREATOR + "')")
public GetStatusResponse getWorkflowStatus(@RequestBody GetStatusRequest request) {...}
@PreAuthorize("@authorizationFilter.hasPermission('" + StorageRole.CREATOR + "')")
public UpdateStatusResponse updateWorkflowStatus(@RequestBody UpdateStatusRequest request) {...}
```
Note that StorageRole.* is used for auth.
## Decision
A new Role model should be created called WorkflowRole and used to assign privelages.
Sample Code
```java
public final class WorkflowRole {
public static final String VIEWER = "service.workflow.viewer";
public static final String CREATOR = "service.workflow.creator";
public static final String ADMIN = "service.workflow.admin";
}
```
## Rationale
Each individual core service should have separate Roles to allow granularity for users to give entitlements
## Consequences
Need to change Core Common and Entitlements Service? Need Groups Support?JoeDmitriy RudkoArtem Nazarenko (EPAM)Joe2020-08-21https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/87Parsing error at AirflowWorkflowEngineServiceImpl class2021-02-11T14:28:10ZBhushan RadeParsing error at AirflowWorkflowEngineServiceImpl classIssue - getting parsing [error](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/service/AirflowWorkflowEngineServiceImpl.java#L86)...Issue - getting parsing [error](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/service/AirflowWorkflowEngineServiceImpl.java#L86) while converting Airflow response JSON String into [TriggerWorkflowResponse.java](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/model/ClientResponse.java) class
Reason -
[at the tim](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/service/AirflowWorkflowEngineServiceImpl.java#L86)e of parsing it expects JSON string as per TriggerWorkflowResponse.java format. but [This line ](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/service/AirflowWorkflowEngineServiceImpl.java#L141) collecting airflow response in the wrong format.
propose change -
replace the 141st [line](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/service/AirflowWorkflowEngineServiceImpl.java#L141) from AirflowWorkflowEngineServiceImpl.java with the following -
```
.responseBody(response.getEntity(String.class));
```ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/90integration test issue2021-07-14T19:59:50ZBhushan Radeintegration test issueThe following 3 test cases are failing for IBM.
- Test 1 - [shouldReturn400WhenGetDetailsForSpecificWorkflowRunInstance](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/testing/workflow...The following 3 test cases are failing for IBM.
- Test 1 - [shouldReturn400WhenGetDetailsForSpecificWorkflowRunInstance](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/testing/workflow-test-core/src/main/java/org/opengroup/osdu/workflow/workflow/v3/WorkflowRunV3IntegrationTests.java#L78) - expected: <400> but was: <404>
this test case expecting 400 status code but as per our understanding code should always gives 404 when workflow run instance does not exist.<br/>
proposed changes at [assertion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/testing/workflow-test-core/src/main/java/org/opengroup/osdu/workflow/workflow/v3/WorkflowRunV3IntegrationTests.java#L91) statement -
```java
assertEquals(HttpStatus.NOT_FOUND, getResponse.getStatus());
```
- Test 2 - [shouldReturnInternalServerErrorWhenIncorrectWorkflowNameWorkflowCreate](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/testing/workflow-test-core/src/main/java/org/opengroup/osdu/workflow/workflow/v3/WorkflowV3IntegrationTests.java#L63) - expected: <500> but was: <200>/<409> <br/>
workflow name validation is missing at controller level. as of now it accept in any format.
- Test 3 - [shouldReturnBadRequestWhenInvalidDagNameWorkflowCreate](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/testing/workflow-test-core/src/main/java/org/opengroup/osdu/workflow/workflow/v3/WorkflowV3IntegrationTests.java#L51) - expected <400> but was: <200>/<409> <br/>
Validation missing! This issue similar to Test 2 <br/>
- as per our understanding _workflow name_ is unique and all test cases are creating a workflow & workflow run with the same name without deleting the old one that affects subsequent tests (error 409 conflicts) also, CSP level we tried to delete created workflow but it did not allow us to delete immediately (threw 412 error. because integration test immediately tried to delete workflow. generally test DAG take some time to update status from **SUBMITTED** to **FINISHED** state in DB) please give us clarity on this.ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/91AirflowConfig should not be in Core, it should be a provider specific impleme...2022-08-23T11:18:45ZMatt WiseAirflowConfig should not be in Core, it should be a provider specific implementationWe do not use User/Pass authentication to hit the airflow API. In fact, AWS's Managed Airflow does not even expose direct API access, only CLI via a protected API. We need the flexibility to choose how we interface with Airflow so this ...We do not use User/Pass authentication to hit the airflow API. In fact, AWS's Managed Airflow does not even expose direct API access, only CLI via a protected API. We need the flexibility to choose how we interface with Airflow so this requirement to create a config with Auth mechanism to Airflow should be moved into provider logic. Any other logic in regards to direct requests to Airflow should also be provider implemented.
https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/config/AirflowConfig.javaethiraj krishnamanaiduDania Kodeih (Microsoft)Wladmir FrazaoJoeChris ZhangDmitriy RudkoSpencer Suttonsuttonsp@amazon.comMatt Wiseethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/53Errors with running tests for Ingestion DAGs2021-03-23T10:19:51ZBrady Spiva [AWS]Errors with running tests for Ingestion DAGs## Expected behavior:
Unit tests and end-to-end tests should work for core code out of the box.
## Observed behavior:
When attempting to run unit and end-to-end tests for Ingestion DAGs core code, the tests in /plugin-unit-tests/ fail d...## Expected behavior:
Unit tests and end-to-end tests should work for core code out of the box.
## Observed behavior:
When attempting to run unit and end-to-end tests for Ingestion DAGs core code, the tests in /plugin-unit-tests/ fail due to an issue related to Airflow's "variable" table ( see the error text below ).
From researching this issue, it appears that this could be related to needing to initialize the Airflow DB?
```❯ pytest /Users/spivbrad/Documents/aws-osdu-code/os-ingestion-dags/tests/plugin-unit-tests
===================================================== test session starts ======================================================
platform darwin -- Python 3.8.6, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/spivbrad/Documents/aws-osdu-code/os-ingestion-dags
plugins: anyio-2.2.0, dash-1.19.0, mock-3.5.1
collected 79 items / 1 error / 78 selected
============================================================ ERRORS ============================================================
_____________________________ ERROR collecting tests/plugin-unit-tests/test_process_manifest_r2.py _____________________________
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1705: in _execute_context
self.dialect.do_execute(
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py:681: in do_execute
cursor.execute(statement, parameters)
E sqlite3.OperationalError: no such table: variable
The above exception was the direct cause of the following exception:
tests/plugin-unit-tests/test_process_manifest_r2.py:27: in <module>
from operators import process_manifest_r2
src/plugins/operators/process_manifest_r2.py:37: in <module>
config.read(Variable.get("core__config__dataload_config_path"))
../../../.local/lib/python3.8/site-packages/airflow/models/variable.py:123: in get
var_val = Variable.get_variable_from_secrets(key=key)
../../../.local/lib/python3.8/site-packages/airflow/models/variable.py:181: in get_variable_from_secrets
var_val = secrets_backend.get_variable(key=key)
../../../.local/lib/python3.8/site-packages/airflow/utils/session.py:65: in wrapper
return func(*args, session=session, **kwargs)
../../../.local/lib/python3.8/site-packages/airflow/secrets/metastore.py:66: in get_variable
var_value = session.query(Variable).filter(Variable.key == key).first()
../../../.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:2695: in first
return self.limit(1)._iter().first()
../../../.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:2779: in _iter
result = self.session.execute(
../../../.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py:1653: in execute
result = conn._execute_20(statement, params or {}, execution_options)
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1520: in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
../../../.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py:313: in _execute_on_connection
return connection._execute_clauseelement(
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1389: in _execute_clauseelement
ret = self._execute_context(
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1748: in _execute_context
self._handle_dbapi_exception(
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1929: in _handle_dbapi_exception
util.raise_(
../../../.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py:198: in raise_
raise exception
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1705: in _execute_context
self.dialect.do_execute(
../../../.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py:681: in do_execute
cursor.execute(statement, parameters)
E sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: variable
E [SQL: SELECT variable.val AS variable_val, variable.id AS variable_id, variable."key" AS variable_key, variable.is_encrypted AS variable_is_encrypted
E FROM variable
E WHERE variable."key" = ?
E LIMIT ? OFFSET ?]
E [parameters: ('core__config__dataload_config_path', 1, 0)]
E (Background on this error at: http://sqlalche.me/e/14/e3q8)
=================================================== short test summary info ====================================================
ERROR tests/plugin-unit-tests/test_process_manifest_r2.py - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no su...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================= 1 error in 1.18s =======================================================
```
## A recommended solution:
The [README](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/tree/master/tests) for the tests mention a docker image that is used for testing. 1) could this image be made available for cloud providers to run these tests in a consistent environment? 2) could we alter the dockerfile used to run the `unit_tests.sh`, `set_airflow_env.sh`, and `test_dags.sh` scripts?
Is this feasible?JoeSiarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Alan HensonJoehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/108registerCustomOperator API </customOperator> returns success if an empty "pro...2022-06-30T20:24:49ZMonalisa SrivastavaregisterCustomOperator API </customOperator> returns success if an empty "properties" block is passed but throws 400 if "properties" block is missingTested on AZURE : API : https://{HOST}/api/workflow/v1/customOperator
Hit the API with below payload:
{
"name": "string",
"className": "string",
"description": "string",
"content": "string",
"properties": [
]
}
Respons...Tested on AZURE : API : https://{HOST}/api/workflow/v1/customOperator
Hit the API with below payload:
{
"name": "string",
"className": "string",
"description": "string",
"content": "string",
"properties": [
]
}
Response : API returns success 200 OK
Now hit the same API with Properties block missing and observe that 400 response code is returned
Expected
If properties block is mandatory, then empty block should also return 400 instead of 200 and return a message with the exact reasonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/110No error message GET /v1/workflow/{workflow_name} getWorkflowByName for broke...2022-06-30T20:24:26ZMonalisa SrivastavaNo error message GET /v1/workflow/{workflow_name} getWorkflowByName for broken or incorrect DAGsTested on Azure:
Steps followed :
1) Register a DAG\Workflow using the POST /workflow create API
2) Use the GET /workflow/{id} getWorkflowById to check the status.
Expected Result : If workflow is created successfully the information ...Tested on Azure:
Steps followed :
1) Register a DAG\Workflow using the POST /workflow create API
2) Use the GET /workflow/{id} getWorkflowById to check the status.
Expected Result : If workflow is created successfully the information should be displayed for the Workflow
Actual Result : Even if the Workflow is broken and not created successfully, the API displays no error, AIrflow shows broken DAG error.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/114POST /v1/workflow/{workflow_name}/workflowRun (Trigger Workflow) with empty r...2021-06-14T16:18:30ZAalekh JainPOST /v1/workflow/{workflow_name}/workflowRun (Trigger Workflow) with empty runId throws 500 internal server error## Description
**Current Behaviour**
For the given request body
```json
{
"runId": "",
"executionContext": {
}
}
```
Error thrown is
```json
{
"code": 500,
"reason": "Unexpectedly failed to insert item into CosmosDB",
...## Description
**Current Behaviour**
For the given request body
```json
{
"runId": "",
"executionContext": {
}
}
```
Error thrown is
```json
{
"code": 500,
"reason": "Unexpectedly failed to insert item into CosmosDB",
"message": "[\"The input name '' is invalid. Ensure to provide a unique non-empty string less than '1024' characters.\"], {\"userAgent\":\"azsdk-java-cosmos/4.7.1 Windows10/10.0 JRE/1.8.0_265\",\"requestLatencyInMs\":212,\"requestStartTimeUTC\":\"2021-04-19T09:01:33.929Z\",\"requestEndTimeUTC\":\"2021-04-19T09:01:34.141Z\",\"connectionMode\":\"DIRECT\",\"responseStatisticsList\":[{\"storeResult\":{\"storePhysicalAddress\":\"rntbd://cdb-ms-prod-eastus2-fd7.documents.azure.com:14178/apps/a78846d5-27aa-45e8-bef0-0950c8a3c1d2/services/9a98fb60-a0fb-43e4-be05-3fe8dc8d6498/partitions/e158dec0-4caf-42e6-b7e8-1eb9dc9b7c84/replicas/132593652864958972p/\",\"lsn\":84405,\"globalCommittedLsn\":84405,\"partitionKeyRangeId\":\"1\",\"isValid\":true,\"statusCode\":400,\"subStatusCode\":0,\"isGone\":false,\"isNotFound\":false,\"isInvalidPartition\":false,\"requestCharge\":1.24,\"itemLSN\":-1,\"sessionToken\":\"-1#84405\",\"exception\":\"[\\\"The input name '' is invalid. Ensure to provide a unique non-empty string less than '1024' characters.\\\"]\",\"transportRequestTimeline\":[{\"eventName\":\"created\",\"durationInMicroSec\":\"0\",\"startTime\":\"2021-04-19T09:01:33.931Z\"},{\"eventName\":\"queued\",\"durationInMicroSec\":\"0\",\"startTime\":\"2021-04-19T09:01:33.931Z\"},{\"eventName\":\"channelAcquisitionStarted\",\"durationInMicroSec\":\"3000\",\"startTime\":\"2021-04-19T09:01:33.931Z\"},{\"eventName\":\"pipelined\",\"durationInMicroSec\":\"1000\",\"startTime\":\"2021-04-19T09:01:33.934Z\"},{\"eventName\":\"transitTime\",\"durationInMicroSec\":\"204000\",\"startTime\":\"2021-04-19T09:01:33.935Z\"},{\"eventName\":\"received\",\"durationInMicroSec\":\"1000\",\"startTime\":\"2021-04-19T09:01:34.139Z\"},{\"eventName\":\"completed\",\"durationInMicroSec\":\"1000\",\"startTime\":\"2021-04-19T09:01:34.140Z\"}],\"rntbdRequestLengthInBytes\":714,\"rntbdResponseLengthInBytes\":325,\"requestPayloadLengthInBytes\":282,\"responsePayloadLengthInBytes\":null,\"channelTaskQueueSize\":1,\"pendingRequestsCount\":1,\"serviceEndpointStatistics\":{\"availableChannels\":1,\"acquiredChannels\":0,\"executorTaskQueueSize\":0,\"inflightRequests\":1,\"lastSuccessfulRequestTime\":\"2021-04-19T08:52:54.424Z\",\"lastRequestTime\":\"2021-04-19T08:52:54.211Z\",\"createdTime\":\"2021-04-19T08:34:55.741Z\",\"isClosed\":false}},\"requestResponseTimeUTC\":\"2021-04-19T09:01:34.141Z\",\"requestResourceType\":\"Document\",\"requestOperationType\":\"Create\"}],\"supplementalResponseStatisticsList\":[],\"addressResolutionStatistics\":{},\"regionsContacted\":[\"https://osdu-mvp-dp1dev-qs29-db-eastus2.documents.azure.com:443/\"],\"retryContext\":{\"retryCount\":0,\"statusAndSubStatusCodes\":null,\"retryLatency\":0},\"metadataDiagnosticsContext\":{\"metadataDiagnosticList\":null},\"serializationDiagnosticsContext\":{\"serializationDiagnosticsList\":[{\"serializationType\":\"ITEM_SERIALIZATION\",\"startTimeUTC\":\"2021-04-19T09:01:33.929Z\",\"endTimeUTC\":\"2021-04-19T09:01:33.929Z\",\"durationInMicroSec\":0}]},\"gatewayStatistics\":null,\"systemInformation\":{\"usedMemory\":\"237293 KB\",\"availableMemory\":\"3432723 KB\",\"systemCpuLoad\":\"(2021-04-19T09:01:08.919Z 5.1%), (2021-04-19T09:01:13.920Z 6.9%), (2021-04-19T09:01:18.921Z 4.0%), (2021-04-19T09:01:23.919Z 5.1%), (2021-04-19T09:01:28.920Z 5.3%), (2021-04-19T09:01:33.922Z 10.5%)\"},\"clientCfgs\":{\"id\":0,\"numberOfClients\":1,\"connCfg\":{\"rntbd\":\"(cto:PT5S, rto:PT5S, icto:PT0S, ieto:PT1H, mcpe:130, mrpc:30)\",\"gw\":\"(cps:1000, rto:PT5S, icto:null, p:false)\",\"other\":\"(ed: true, cs: false)\"},\"consistencyCfg\":\"(consistency: null, mm: true, prgns: [])\"}}"
}
```
**Expected Behaviour**
Should throw an error saying `runId` cannot be empty or invalid `runId` given OR `runId` needs to be generated (similar to what happens when `runId` field is not present in the request body) - Need confirmation on the expected behaviour
Works fine when the request body does not have `runId` as the key
```json
{
"executionContext": {
}
}
```
cc: @kibattul @vineethgunahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/64[Manifest-based] Deferred integrity check (ELT) and/or Current implementation...2021-06-23T17:42:01ZKateryna Kurach (EPAM)[Manifest-based] Deferred integrity check (ELT) and/or Current implementation (ETL)There is a situation when the Manifest contains entities that have links inside to other pieces of data, these links can refer either to entities inside the Manifest or to already stored on OSDU ones. At the current implementation we che...There is a situation when the Manifest contains entities that have links inside to other pieces of data, these links can refer either to entities inside the Manifest or to already stored on OSDU ones. At the current implementation we check entities’ integrity during Manifest based ingesting, and it can take a lot of time to check every entity’s reference to other ones. Also, there is a problem when the ingested entity doesn’t have the unique id or has the surrogate key, this causes issues with identifying skipped due to inconsistency entities. The solution may be to store entities as they are, get unique OSDU ids, replace surrogate keys with real ids, then start background DAG that will check data consistency of each record. For sure, the mechanism of setting current status of records (consistent, not consistent, not verified) must be invented.
This solution has to be discussed in more details.https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/35CSV Enhancement - Id generation strategy2021-07-08T02:08:41ZSmitha ManjunathCSV Enhancement - Id generation strategyCurrently, there Id generation strategy in CSV parser is -
<ul>
<li> Get all the fields marked as an 'x-osdu-natural key' ; concatenate them and get a base 64 encoding of it </li>
<li> If the schema doesnt have any 'natural key' fields,...Currently, there Id generation strategy in CSV parser is -
<ul>
<li> Get all the fields marked as an 'x-osdu-natural key' ; concatenate them and get a base 64 encoding of it </li>
<li> If the schema doesnt have any 'natural key' fields, then let storage service generate the Id </li>
</ul>
However, some csv files can contain a column called 'id' which can be a unique identifier for a row in the file.
In such situations, it would be beneficial to have the id generation strategy to incorporate the value in that column.
This would make searching for the record much easier as the end user would already know what the id of his record would be.
Another problem is that when we ingest the <b> same file multiple times </b> , with each ingestion, records are created again (with a different, randomly generated id by the storage service).
The proposed format for id generation could be as follows :
<ol>
<li> check if schema has natural keys defined. If yes, store record with id - tenant:type:location:{encodedId}
<li> else, check if file has 'id' column. if yes, use it and store record with id - tenent:type:location:{id}
<li> if both above conditions aren't true, let storage service handle the id generation.
</ol>
Example of schema with no osdu natural keys - https://community.opengroup.org/osdu/platform/system/schema-service/-/blob/master/deployments/shared-schemas/osdu/master-data/Wellbore.1.0.0.jsonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/127Code refactoring - WorkflowEngineRequest2022-02-15T09:38:05ZAalekh JainCode refactoring - WorkflowEngineRequestThere are too many constructors that exists for [`WorkflowEngineRequest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/model/Wo...There are too many constructors that exists for [`WorkflowEngineRequest`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/workflow-core/src/main/java/org/opengroup/osdu/workflow/model/WorkflowEngineRequest.java). It would be a better decision to use a builder pattern instead to maintain a cleaner code.https://community.opengroup.org/osdu/platform/data-flow/ingestion/home/-/issues/51ADR: Introduce Code Coverage metric in deployment pipelines.2022-05-10T09:09:51ZAkansha Rajput[Microsoft]ADR: Introduce Code Coverage metric in deployment pipelines.# Decision Title
Introduce Code Coverage metric in deployment pipelines
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Currently build pipeline normally consists of following ...# Decision Title
Introduce Code Coverage metric in deployment pipelines
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Currently build pipeline normally consists of following jobs
* Build
* Azure Code Coverage
* Scan
* Containerize
* Deploy
* Integration
* Additional steps specific to services.
For code changes to be deployed, they are dependent on build job for all cloud providers.
To ensure better code quality, we propose to introduce an additional step in build job.
This step will help us to maintain code coverage percentage for core module and if at any given time the code coverage goes below threshold levels, we can stop the build further.
We will keep this as a continuous process. We can start with a comparatively low value as coverage threshold and keep updating the value as our code coverage improves with time.
## Decision
* Add code coverage plugin to core module.
* Add coverage test step in build job to check code coverage values against specified threshold value.
* If code coverage limit is not met, break the pipeline.
## Consequences
If code coverage threshold is not met for any branch the build will stop with appropriate exit code and output message.
## Rationale
This will help us to maintain consistent code coverage metrics across the services and in turn help us to enhance overall code quality.
## Example
* I have added jacoco plugin for measuring code coverage in azure module
[pom.xml](https://community.opengroup.org/osdu/platform/system/schema-service/-/blob/master/provider/schema-azure/pom.xml#L235)
* I have added code coverage step for azure module.
[azure_code_coverage](https://community.opengroup.org/osdu/platform/ci-cd-pipelines/-/blob/master/cloud-providers/azure.yml#L295)
* Updated deploy step for azure to verify code coverage metric before proceeding.
[azure_deploy_step](https://community.opengroup.org/osdu/platform/ci-cd-pipelines/-/blob/master/cloud-providers/azure.yml#L201)
I have also updated service's gitlab-ci.yml file for threshold value
[gitlab-ci.yml](https://community.opengroup.org/osdu/platform/system/schema-service/-/blob/master/.gitlab-ci.yml#L28)
The final pipeline consists of azure_code_coverage step to measure code coverage metric and deploy step checks for this metric >= threshold value passed in gitlab-ci.yml file. If this condition is true build proceeds as expected, else fails and deploy stops to proceed further.
[Sample Pipeline](https://community.opengroup.org/osdu/platform/system/schema-service/-/pipelines/98274)https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/58WITSML Parser is failing with Tubular data2022-06-30T20:14:04ZDebasis ChatterjeeWITSML Parser is failing with Tubular dataTested in Azure R3M11 Preship environment.
I have experienced a failure.
Data file.
[Tubular__witsml-DC.xml](/uploads/9f7618f3cf1825573343cc7a39e6a2bd/Tubular__witsml-DC.xml)
Log
[M11_Azure_WITSML-Tubular-Debasis.txt](/uploads/11ae3e2...Tested in Azure R3M11 Preship environment.
I have experienced a failure.
Data file.
[Tubular__witsml-DC.xml](/uploads/9f7618f3cf1825573343cc7a39e6a2bd/Tubular__witsml-DC.xml)
Log
[M11_Azure_WITSML-Tubular-Debasis.txt](/uploads/11ae3e249b9ddd2bc3b1b62ae902904c/M11_Azure_WITSML-Tubular-Debasis.txt)
cc - @todaiks for informationetienne peyssonetienne peyssonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/61WITSML Parser - Well Trajectory Failure2022-08-31T11:19:18ZVadzim KulybaWITSML Parser - Well Trajectory Failure```
[2022-08-29, 20:47:17 UTC] {validate_schema.py:322} ERROR - Schema validation error. Data field.
[2022-08-29, 20:47:17 UTC] {validate_schema.py:323} ERROR - Manifest kind: opendes:wks:work-product-component--WellboreTrajectory:1.1.0
...```
[2022-08-29, 20:47:17 UTC] {validate_schema.py:322} ERROR - Schema validation error. Data field.
[2022-08-29, 20:47:17 UTC] {validate_schema.py:323} ERROR - Manifest kind: opendes:wks:work-product-component--WellboreTrajectory:1.1.0
[2022-08-29, 20:47:17 UTC] {validate_schema.py:324} ERROR - Error: 'Azi' does not match '^[\\w\\-\\.]+:reference-data\\-\\-TrajectoryStationPropertyType:[\\w\\-\\.\\:\\%]+:[0-9]*$'
Failed validating 'pattern' in schema['properties']['data']['allOf'][3]['properties']['AvailableTrajectoryStationProperties']['items']['properties']['TrajectoryStationPropertyTypeID']:
{'description': 'The reference to a trajectory station property type - '
'of if interpreted as channels, the curve or channel '
'name type, identifying e.g. MD, Inclination, Azimuth. '
'This is a relationship to a '
'reference-data--TrajectoryStationPropertyType record '
'id.',
'example': 'partition-id:reference-data--TrajectoryStationPropertyType:AzimuthTN:',
'pattern': '^[\\w\\-\\.]+:reference-data\\-\\-TrajectoryStationPropertyType:[\\w\\-\\.\\:\\%]+:[0-9]*$',
'title': 'Trajectory Station Property Type ID',
'type': 'string',
'x-osdu-relationship': [{'EntityType': 'TrajectoryStationPropertyType',
'GroupType': 'reference-data'}]}
On instance['data']['AvailableTrajectoryStationProperties'][0]['TrajectoryStationPropertyTypeID']:
'Azi'
```
This is error log from azure DEMO validate_manifest_schema_task, but it is common code issue, because it is repro on gcp (cc @Yan_Sushchynski)
I think the main issue inside parser in this line:
https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/energistics_parsers/witsml_2_0/trajectory_parser.py#L117
Because `tagname` didn't match this schema pattern (cc @epeysson)M14 - Release 0.17https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/62WITSML Parser - SchemaFormatType needs to be updated2023-05-02T20:56:59ZChad LeongWITSML Parser - SchemaFormatType needs to be updatedThe reference data for Energistics SchemaFormatType has been updated in the data definition https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/ReferenceValues/Manifests/reference-data/OPEN/SchemaFormatType.1.0.0.jso...The reference data for Energistics SchemaFormatType has been updated in the data definition https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/ReferenceValues/Manifests/reference-data/OPEN/SchemaFormatType.1.0.0.json to reflect the different WITSML version.
Problem:
The WITSML parser creates a manifest after parsing using the hardcoded value:
https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/energistics_parsers/parser.py#L446
It needs to be updated to reflect the changes in the data definition.
`osdu:reference-data--SchemaFormatType:EnergisticsWITSML`
to
`osdu:reference-data--SchemaFormatType:Energistics.WITSML.v1.4`https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/79Error diagnostics - need to improve significantly2022-12-13T00:31:21ZDebasis ChatterjeeError diagnostics - need to improve significantlyYou may start of by checking here.
https://community.opengroup.org/osdu/platform/pre-shipping/-/tree/main/R3-M14/AWS-M14/Ingestion%20DAG%20CSV
For each and every problem, I did not get suitable clue from error log.
1. problem in data. ...You may start of by checking here.
https://community.opengroup.org/osdu/platform/pre-shipping/-/tree/main/R3-M14/AWS-M14/Ingestion%20DAG%20CSV
For each and every problem, I did not get suitable clue from error log.
1. problem in data. ELEVATION has non numeric value.
2. problem in schema - TVD, Latitude, Longitude - missed "type=string".
3. At times when the file is missed (incorrect sequence in collection), it gives fatal error instead of saying clearly that "Unable to get the CSV file".
Caused situation where record gets created, we can see all properties from Storage service, but none from Search service.
Nearly impossible to figure out, for average Data Loader (user).
Next, imagine we are ingesting 1000 rows from source CSV and problem occurs in row-253 and row-455.
User's expectation is that CSV Ingestion program should pinpoint and clearly indicate row number and type of problem which caused the failure.
cc @chad , @tdixonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/150Misleading log statements2022-12-12T15:35:32ZMaksim MalkovMisleading log statementsWorkflow service search for a triggered workflow first in provided data partition. System workflow like CSV would not be available in data partition. In such cases service publish logs "workflow not found"
Next same workflow is searched ...Workflow service search for a triggered workflow first in provided data partition. System workflow like CSV would not be available in data partition. In such cases service publish logs "workflow not found"
Next same workflow is searched in system db and it is found there and processing completes
But these logs are creating a confusion that some workflow is not found by workflow service, but actually there is no such issue.M16 - Release 0.19https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/64Refactor DAG related code2023-04-04T10:49:00ZYan Sushchynski (EPAM)Refactor DAG related code### Introduction
There is DAG related code that is executed in the container during a DAG run. The code is [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src...### Introduction
There is DAG related code that is executed in the container during a DAG run. The code is [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/main.py) and [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/create_energistics_manifest.py). And this code looks messy and outdated, and requires some refactoring.
### What should be done?
1. Update the code to make it work with the most recent `osdu-*` Python libs. The dependencies are here https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/build/requirements.txt
2. Delete deprecated functionality of processing files by `preload_file_path` [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/create_energistics_manifest.py#L314).
3. Add the static-analysis step in the CI/CD.
4. Add possibility to pass the user's access/id token to the DAG
5. Common refactoring, because the code is messy now (a lot of "ifs" and lines of code in a single function)M17 - Release 0.20Vadzim Kulybaharshit aggarwalWalter Detienne peyssonMarc Burnie [AWS]Vadzim Kulybahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/issues/9field schema-id replaced by {{data-partition-id}}2023-05-22T11:13:52Zli shuangqifield schema-id replaced by {{data-partition-id}}here use "{{data-partition-id}}" replace schema-id "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**" is a bug.the first section of schema-id is authority(OSDU).An error is reported during schema validation when we use different ...here use "{{data-partition-id}}" replace schema-id "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**" is a bug.the first section of schema-id is authority(OSDU).An error is reported during schema validation when we use different partition.
`field.replace("{{data-partition-id}}", self.context.data_partition_id)`
`SURROGATE_KEYS_PATHS = [
("definitions", "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**", "properties", "Datasets",
"items"),
("definitions", "{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0", "properties", "Artefacts",
"items", "properties", "ResourceID"),
("properties", "data", "allOf", 1, "properties", "Components", "items"),
]`M18 - Release 0.21https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/issues/18Capture OpenVDS library version number in DAG name or some such suitable place2024-03-26T11:21:36ZDebasis ChatterjeeCapture OpenVDS library version number in DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallDeepa KumariDeepa Kumari