Data Ingestion issueshttps://community.opengroup.org/groups/osdu/platform/data-flow/ingestion/-/issues2024-03-26T11:21:36Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/issues/18Capture OpenVDS library version number in DAG name or some such suitable place2024-03-26T11:21:36ZDebasis ChatterjeeCapture OpenVDS library version number in DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallDeepa KumariDeepa Kumarihttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/issues/12Segy to VDS conversion creates non-compliant work product component2024-01-08T12:09:03ZIvan Medeiros MonteiroSegy to VDS conversion creates non-compliant work product componentWhen conversion from Segy to VDS is triggered, the file collection for the new VDS is created and it is associated with the work product component (SeismicTraceData) as an artefact. However, the role of the artefact is specified by the p...When conversion from Segy to VDS is triggered, the file collection for the new VDS is created and it is associated with the work product component (SeismicTraceData) as an artefact. However, the role of the artefact is specified by the property "RoleId" instead of "RoleID", which makes the record to fail the schema validation.
Example of error message: "data.Artefacts.0.RoleId: Additional property not allowed"
The schema definition of this association can be seen here : https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/work-product-component/SeismicTraceData.1.5.1.md
And the implementation of this association can be seen here: https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/blob/master/osdu_ingestion/libs/segy_conversion_metadata/base_metadata.py?ref_type=heads#L130M22 - Release 0.25https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/56Fetch-and-Ingest - authentication uses flow type value of "RefreshTokenKeyNam...2024-01-08T12:04:35ZDebasis ChatterjeeFetch-and-Ingest - authentication uses flow type value of "RefreshTokenKeyName" although valid value is "RefreshToken"When I used value of RefreshToken then I get this error.
[2024-01-02, 13:41:17 UTC] {token_generator.py:43} INFO - OAuth Flow type : RefreshToken Not supported!
But this is valid per this site.
https://community.opengroup.org/osdu/data...When I used value of RefreshToken then I get this error.
[2024-01-02, 13:41:17 UTC] {token_generator.py:43} INFO - OAuth Flow type : RefreshToken Not supported!
But this is valid per this site.
https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/reference-data/OAuth2FlowType.1.0.0.md
I had to work around by using a value of "RefreshTokenKeyName".
cc @priyankabhongadeM23 - Release 0.26https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/159ADR: Implement Airflow facade endpoint2024-01-08T10:10:33ZRiabokon Stanislav(EPAM)[GCP]ADR: Implement Airflow facade endpoint# Context
OSDU Platform uses Apache Airflow for orchestration of various data ingestion and processing jobs.
# Problem statement
Currently OSDU Airflow component does not support data isolation for multi-tenant deployments. Airflow Admi...# Context
OSDU Platform uses Apache Airflow for orchestration of various data ingestion and processing jobs.
# Problem statement
Currently OSDU Airflow component does not support data isolation for multi-tenant deployments. Airflow Administrative UI is available for all users and makes possible to observe all the processing data for all existing tenants which may cause data leaks and security issues.
# Proposal of the solution
It is proposed to introduce a facade that will replace Airflow admin UI and will collect in a tenant-specific way via the Airflow REST API job execution information (namely its resulting x-com variables). To do this we need to add a new endpoint in the Workflow service API, which will collect the details of the DAG run using the existing Airflow REST API v2.
New API endpoint /v1/workflow/{workflow_name}/workflowRun/{runId}/lastInfo should implement the following business logic:
![image-2023-10-18_17-48-20](/uploads/44f53a3de410b8dff0276b127387f29a/image-2023-10-18_17-48-20.png)
- Get internal workflow entity with getWorkflowRunByName and check if submittedBy corresponds to the user submitted in the header, otherwise return 401 NOT_AUTHORIZED
- Get list of all task instances with /dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances where dag_id is workflow_name and dag_run_id is runId
- Select task instance with maximal end_date
- With task_id of the selected task instance get list of xcom entries keys /dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/xcomEntries
- Obtain xcom values by theis keys using /dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/xcomEntries/{xcom_key}
- Return task instance details from step 3 combined with xcom values map in a single JSON responceM23 - Release 0.26Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRiabokon Stanislav(EPAM)[GCP]Andrei Dalhikh [EPAM/GC]Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/issues/11EDS Naturalization have circular import2023-12-19T19:11:46ZBruce JinEDS Naturalization have circular importThe file `osdu_airflow/eds/eds_naturalization/signed_url_details/abstract/environment_factory.py` and file `osdu_airflow/eds/eds_naturalization/signed_url_details/concrete/operator_environment_factory.py` is trying to import each other.
...The file `osdu_airflow/eds/eds_naturalization/signed_url_details/abstract/environment_factory.py` and file `osdu_airflow/eds/eds_naturalization/signed_url_details/concrete/operator_environment_factory.py` is trying to import each other.
Which disables the component.M22 - Release 0.25https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/eds-dms/-/issues/18Side effect to ingest configuration files of EDS DMS2023-11-23T13:06:23ZRiabokon Stanislav(EPAM)[GCP]Side effect to ingest configuration files of EDS DMSThe GC Team has identified an issue. According to the architectural design of this service, the procedure involves creating configuration files within the Storage Service. Subsequently, new records will be indexed by the Indexer Search a...The GC Team has identified an issue. According to the architectural design of this service, the procedure involves creating configuration files within the Storage Service. Subsequently, new records will be indexed by the Indexer Search and placed into Elastic Search. As a result, these records become discoverable through the Search Service.
Current Arch:
![image](/uploads/82a2a551574d5772a2105b64fcf27950/image.png)
`https://community.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query`
bode:
```
{
"kind": "osdu:wks:reference-data--SecuritySchemeType:1.0.0"
}
```
response:
```
{
"results": [
{
"data": {
"AttributionPublication": null,
"InactiveIndicator": null,
"Description": "An open and industry-standard protocol for authorization",
"ResourceLifecycleStatus": null,
"ResourceCurationStatus": null,
"TechnicalAssuranceID": null,
"Code": "OAuth2",
"Source": "SecuritySchemeType.1.0.0.xlsx",
"Name": "OAuth 2.0",
"AttributionAuthority": "OSDU",
"ResourceHomeRegionID": null,
"VirtualProperties.DefaultName": "OAuth 2.0",
"AttributionRevision": null,
"ResourceSecurityClassification": null,
"ID": "OAuth2",
"ExistenceKind": null
},
"kind": "osdu:wks:reference-data--SecuritySchemeType:1.0.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.group"
],
"owners": [
"data.default.owners@osdu.group"
]
},
"type": "reference-data--SecuritySchemeType",
"version": 1697963580525660,
"tags": {
"normalizedKind": "osdu:wks:reference-data--SecuritySchemeType:1"
},
"modifyUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"modifyTime": "2023-10-22T08:33:00.665Z",
"createTime": "2022-09-30T10:26:21.248Z",
"authority": "osdu",
"namespace": "osdu:wks",
"legal": {
"legaltags": [
"osdu-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"id": "osdu:reference-data--SecuritySchemeType:OAuth2"
},
{
"data": {
"AttributionPublication": null,
"InactiveIndicator": null,
"Description": "Requests are authenticated using an access key, such as a JSON Web Token, in the request header.",
"ResourceLifecycleStatus": null,
"ResourceCurationStatus": null,
"TechnicalAssuranceID": null,
"Code": "Bearer",
"Source": "SecuritySchemeType.1.0.0.xlsx",
"Name": "Bearer Token",
"AttributionAuthority": "OSDU",
"ResourceHomeRegionID": null,
"VirtualProperties.DefaultName": "Bearer Token",
"AttributionRevision": null,
"ResourceSecurityClassification": null,
"ID": "Bearer",
"ExistenceKind": null
},
"kind": "osdu:wks:reference-data--SecuritySchemeType:1.0.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.group"
],
"owners": [
"data.default.owners@osdu.group"
]
},
"type": "reference-data--SecuritySchemeType",
"version": 1697963580525660,
"tags": {
"normalizedKind": "osdu:wks:reference-data--SecuritySchemeType:1"
},
"modifyUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"modifyTime": "2023-10-22T08:33:00.665Z",
"createTime": "2022-09-30T10:28:21.843Z",
"authority": "osdu",
"namespace": "osdu:wks",
"legal": {
"legaltags": [
"osdu-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"id": "osdu:reference-data--SecuritySchemeType:Bearer"
}
],
"aggregations": null,
"totalCount": 2
}
```
It appears there may be a potential security concern within the EDS Service architecture.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/157Pass workflow user ID to the Airflow as part of payload.2023-11-08T19:54:46ZRiabokon Stanislav(EPAM)[GCP]Pass workflow user ID to the Airflow as part of payload.This issue was discovered by GC Team when the QA Team was testing a platform.
It revolves around triggering workflows and the addition of the User ID into the execution context through the 'x-user-id' header.
Upon further investigation,...This issue was discovered by GC Team when the QA Team was testing a platform.
It revolves around triggering workflows and the addition of the User ID into the execution context through the 'x-user-id' header.
Upon further investigation, we came across the(MR) https://community.opengroup.org/osdu/platform/deployment-and-operations/helm-charts-azure/-/merge_requests/366, which appears to implement this logic with a dependency on the infrastructural level.
However, we have to add some kind of validation or additional logic to use a header 'user' in core logic. This adjustment is essential as we might want to use the service without a service mesh or similar infrastructure.
org.opengroup.osdu.workflow.service.WorkflowRunServiceImpl#addUserId
```
private Map<String, Object> addUserId(String workflowName, TriggerWorkflowRequest request) {
final Map<String, Object> executionContext = request.getExecutionContext();
if (executionContext.get(KEY_USER_ID) != null) {
String errorMessage = String.format("Request to trigger workflow with name %s failed because execution context contains reserved key 'userId'", workflowName);
throw new AppException(400, "Failed to trigger workflow run", errorMessage);
}
String userId = dpsHeaders.getUserId();
log.debug("putting user id: " + userId + " in execution context");
executionContext.put(KEY_USER_ID, userId);
return executionContext;
}
```M21 - Release 0.24https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/158A custom header 'x-user-id' is used in core part2023-11-08T19:54:10ZRiabokon Stanislav(EPAM)[GCP]A custom header 'x-user-id' is used in core partI wanted to bring to your attention an issue that was identified by our GC Team while they were in the process of addressing https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/157.
org.opengrou...I wanted to bring to your attention an issue that was identified by our GC Team while they were in the process of addressing https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/157.
org.opengroup.osdu.workflow.service.WorkflowRunServiceImpl#addUserId
```
private Map<String, Object> addUserId(String workflowName, TriggerWorkflowRequest request) {
final Map<String, Object> executionContext = request.getExecutionContext();
if (executionContext.get(KEY_USER_ID) != null) {
String errorMessage = String.format("Request to trigger workflow with name %s failed because execution context contains reserved key 'userId'", workflowName);
throw new AppException(400, "Failed to trigger workflow run", errorMessage);
}
String userId = dpsHeaders.getUserId();
log.debug("putting user id: " + userId + " in execution context");
executionContext.put(KEY_USER_ID, userId);
return executionContext;
}
```
The current logic relies on a custom header that is primarily intended for use at an infrastructural level, as outlined in https://community.opengroup.org/osdu/platform/data-flow/ingestion/home/-/issues/52. The GC team approved an ADR with the understanding that this custom header would not be utilized within the core codebase.
However, as indicated in https://community.opengroup.org/osdu/platform/deployment-and-operations/helm-charts-azure/-/merge_requests/366, a header named 'x-user-id' is populated with data from 'x-on-behalf-of' using a specific rule. This mechanism aligns with the requirements of the CSP provider but may not be entirely suitable for the Core Part of the Workflow Service.
```
if (jwt_authn[msft_issuer]["appid"] == serviceAccountClientId and on_behalf_of_header ~= nil and on_behalf_of_header ~= '') then
request_handle:headers():add("x-user-id", request_handle:headers():get("x-on-behalf-of"))
else
request_handle:headers():add("x-user-id", jwt_authn[msft_issuer]["appid"])
end
```
This logic introduces **three key issues**:
- The core part of the Workflow service depends on a custom CSP header to execute context, which may not be in alignment with the intended architecture.
- The Workflow service may not operate correctly without <ISTIO> and the accompanying special rule, potentially limiting its usability.
- There is a security concern in that 'x-user-id' is not currently validated on the BackEnd side, allowing any user to utilize it for potentially vested interests.
_As for the third problem_, there is the test case:
1. A user was authorized within Workflow Service.
1. This user uses 'x-user-id' with the name of another user, resulting in the triggering of a workflow under the identity of a different user.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/154Workflow Run API - requires datapartitionId in body as well as header2023-10-26T12:23:43ZSurabhi SethWorkflow Run API - requires datapartitionId in body as well as header
API: Workflow Service API \> Workflow Run /workflow/{workflow_name}/workflowRun
This service takes data-partition-id as part of the headers as well as payload body { "executionContext": { "id": "string", \*\* "dataPartitionId": "string...
API: Workflow Service API \> Workflow Run /workflow/{workflow_name}/workflowRun
This service takes data-partition-id as part of the headers as well as payload body { "executionContext": { "id": "string", \*\* "dataPartitionId": "string"\*\* }, "runId": "string" }
![MicrosoftTeams-image__5\<span data-escaped-char\>\_\</span\>](/uploads/5e8d61cdc1316019ab905597094525b9/MicrosoftTeams-image__5_.png)Issue: Requesting for dataPartitionId in the payload body is redundant, and inconsistent with the implementation of all other OSDU API's (where data-partition-id is used from the header)
Ref: https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/docs/api/openapi.workflow.yaml?plain=0Chad LeongDeepa KumariChad Leonghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/32Versal Spatial Data Ingestion (While Ingesting the data, getting Spatial Coor...2023-10-05T05:07:25ZSelva Kumar SenathipathyVersal Spatial Data Ingestion (While Ingesting the data, getting Spatial Coordinate block as Empty)As part of Versal OSDU integration, spatial coordinate blocks are inserted as empty blocks into OSDU Target system.
While going through the air flow code we found the below observation,
We found that in FetchAndIngest the cleaning pro...As part of Versal OSDU integration, spatial coordinate blocks are inserted as empty blocks into OSDU Target system.
While going through the air flow code we found the below observation,
We found that in FetchAndIngest the cleaning process of the records is removing the coordinates, when coordinates has nested list. Seems like cleaning process supports only point type of geometry but versal has Multiline String and Multi Polygon type of geometry with nested list of coordinates.
For nested list e.g. [[-0.7484, 61.4182], [-0.9396, 61.4893]] the following method (_iterate_list) is returning empty list. Please find below snapshot of the methods where we think it is removing the coordinates values when coordinates are in the form of nested list.
**Repo Link**:_ https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/blob/master/osdu_airflow/eds/eds_ingest/clean_records.py_
![Air_Flow_clean_process](/uploads/8f41d0527d37d578f1395d4af5d1993b/Air_Flow_clean_process.jpg)
![Air_Flow_List_Iterate](/uploads/aec8664b4a47290f61f7425025d8dab4/Air_Flow_List_Iterate.jpg)M20 - Release 0.23Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/issues/7Static analyzer fails in EDS operators2023-09-20T05:19:39ZYan Sushchynski (EPAM)Static analyzer fails in EDS operatorsJob [#2183134](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/jobs/2183134) failed for b2fb09af463e7d884562ea6ae89535a11eaa552f:
Could I ask you have a look at the failed job?Job [#2183134](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/jobs/2183134) failed for b2fb09af463e7d884562ea6ae89535a11eaa552f:
Could I ask you have a look at the failed job?M21 - Release 0.24Ashish SaxenaJeyakumar DevarajuluAshish Saxenahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/issues/11Upgrade json schema version to support Airflow constraint file2023-08-30T19:40:15ZGuillaume CailletUpgrade json schema version to support Airflow constraint fileAirflow added Python "constraints" files a while ago: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#constraints-files
These files lock the `jsonschema` version, a library used in `osdu-inge...Airflow added Python "constraints" files a while ago: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#constraints-files
These files lock the `jsonschema` version, a library used in `osdu-ingestion-lib` (usually to 4.X see for example this constraint file for the latest Airflow version: https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt)
But this creates issue with the current `setup.py` file which requires a very specific version (3.2.0) and so the Pip resolver can't find a compatible version:
```
osdu-ingestion 0.23.0rc479+c8d6c217 depends on jsonschema==3.2.0
The user requested (constraint) jsonschema==4.17.3
```https://community.opengroup.org/osdu/platform/data-flow/ingestion/home/-/issues/48Parsers/converters code organization porposal2023-07-05T10:09:40ZSiarhei Khaletski (EPAM)Parsers/converters code organization porposal## Rationale
Now the number of different parsers was presented to the OSDU. The parsers/converters were implemented with using of different technologies and programming languages(C++, Java, Python, etc.).
It can cause difficulties duri...## Rationale
Now the number of different parsers was presented to the OSDU. The parsers/converters were implemented with using of different technologies and programming languages(C++, Java, Python, etc.).
It can cause difficulties during onboarding such parsers: requirements, code organization, runtime environment setup.
## Objective
Approve or develop a unified approach regarding to the parsers/converters representation and usage as Airflow DAGs’ operators.
## Proposal
The intention to use as much as possible containerized DAG’s steps, i.e. to use KubernetesPodOperator was mentioned as one of the best practices for Manifest-based Ingestion pipelines.
It means that the pipeline step can be implemented on absolutely different technology and executable part of the step will be executed inside Docker container.
The proposal is to deliver with a parser code a properly configured base Dockerfile. This docker file will contain only required dependencies to run the parser with ability to extend or configure the executable invocation (parameters, environment variables etc.)
Each CSP provider, if needed, should develop own Dockerfile with additional requirements or environment variables setup.
![Parsers_dependencies](/uploads/d24ee18309b2471c94bab6023422807d/Parsers_dependencies.png)
## Implementation
The proposal's implementation example - [WITSML parser](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics-osdu-integration/-/tree/master/build)
### Note
For lightweight DAG’s dependencies (local dependencies) the [Packaged DAGs](https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/47) approach can be used.Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/issues/9field schema-id replaced by {{data-partition-id}}2023-05-22T11:13:52Zli shuangqifield schema-id replaced by {{data-partition-id}}here use "{{data-partition-id}}" replace schema-id "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**" is a bug.the first section of schema-id is authority(OSDU).An error is reported during schema validation when we use different ...here use "{{data-partition-id}}" replace schema-id "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**" is a bug.the first section of schema-id is authority(OSDU).An error is reported during schema validation when we use different partition.
`field.replace("{{data-partition-id}}", self.context.data_partition_id)`
`SURROGATE_KEYS_PATHS = [
("definitions", "**{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0**", "properties", "Datasets",
"items"),
("definitions", "{{data-partition-id}}:wks:AbstractWPCGroupType:1.0.0", "properties", "Artefacts",
"items", "properties", "ResourceID"),
("properties", "data", "allOf", 1, "properties", "Components", "items"),
]`M18 - Release 0.21https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/62WITSML Parser - SchemaFormatType needs to be updated2023-05-02T20:56:59ZChad LeongWITSML Parser - SchemaFormatType needs to be updatedThe reference data for Energistics SchemaFormatType has been updated in the data definition https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/ReferenceValues/Manifests/reference-data/OPEN/SchemaFormatType.1.0.0.jso...The reference data for Energistics SchemaFormatType has been updated in the data definition https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/ReferenceValues/Manifests/reference-data/OPEN/SchemaFormatType.1.0.0.json to reflect the different WITSML version.
Problem:
The WITSML parser creates a manifest after parsing using the hardcoded value:
https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/energistics_parsers/parser.py#L446
It needs to be updated to reflect the changes in the data definition.
`osdu:reference-data--SchemaFormatType:EnergisticsWITSML`
to
`osdu:reference-data--SchemaFormatType:Energistics.WITSML.v1.4`https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-ingestion-lib/-/issues/10Manifest ingestion fail on non-file-based datasets2023-04-26T17:17:49ZLaurent DenyManifest ingestion fail on non-file-based datasetsThe validation script validate_file_source.py, is rejecting datasets that are not matching the hardcoded types:
* FILE = ":dataset--File."
* FILE_COLLECTION = ":dataset--FileCollection."
* EDS_FILE = ":dataset--ConnectedSource."
Other d...The validation script validate_file_source.py, is rejecting datasets that are not matching the hardcoded types:
* FILE = ":dataset--File."
* FILE_COLLECTION = ":dataset--FileCollection."
* EDS_FILE = ":dataset--ConnectedSource."
Other dataset types that are not associated with files, should not be checked for any of the file attributes such as `FileSource`. For example, the following manifest that contains an ETP dataset should be bypassing the file associated tests.
```json
{
"kind": "osdu:wks:Manifest:1.0.0",
"Data": {
"Datasets": [
{
"acl": {
"viewers": [
"data.default.viewers@opendes.contoso.com"
],
"owners": [
"data.default.owners@opendes.contoso.com"
]
},
"kind": "osdu:wks:dataset--ETPDataspace:1.0.0",
"legal": {
"legaltags": [
"opendes-ReservoirDDMS-Legal-Tag"
],
"otherRelevantDataCountries": [
"US",
"UK"
]
},
"createTime": "2023-03-21T16:33:19.651Z",
"modifyTime": "2023-03-21T16:33:19.651Z",
"id": "opendes:dataset--ETPDataspace:M16_Demo-Volve_Reservoir",
"version": 1,
"data": {
"ExistenceKind": "opendes:reference-data--ExistenceKind:Actual:",
"DatasetProperties": {
"URI": "eml:///dataspace('M16_Demo/Volve_Reservoir')"
},
"Name": "M16_Demo/Volve_Reservoir"
}
}
]
}
}
```M17 - Release 0.20Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/17CSV Parser error message unclear - when trying frame of reference meta (for CRS)2023-04-10T12:36:27ZDebasis ChatterjeeCSV Parser error message unclear - when trying frame of reference meta (for CRS)I wanted to use non-standard CRS (ex: NAD27) in meta (frame of reference) portion.
Airflow log shows me some failure (reason unclear).
Request to make this more verbose for Data Loader to be able to troubleshoot.
See supporting informa...I wanted to use non-standard CRS (ex: NAD27) in meta (frame of reference) portion.
Airflow log shows me some failure (reason unclear).
Request to make this more verbose for Data Loader to be able to troubleshoot.
See supporting information (enclosed).
source CSV
[wellbore-DC-AWS.csv](/uploads/93b8a21eaab4724497fbd58b2a42bc5a/wellbore-DC-AWS.csv)
json for dataset registry creation
[06A-CSV-Create-Dataset-Registry-custom-AWS.txt](/uploads/3b47cfbbc924ac54b84117ea892f48e6/06A-CSV-Create-Dataset-Registry-custom-AWS.txt)
airflow log
[2021_05_15-failure-CRS-value.txt](/uploads/a65ce3aae5aa55d52719c1de84a18e37/2021_05_15-failure-CRS-value.txt)https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/64Refactor DAG related code2023-04-04T10:49:00ZYan Sushchynski (EPAM)Refactor DAG related code### Introduction
There is DAG related code that is executed in the container during a DAG run. The code is [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src...### Introduction
There is DAG related code that is executed in the container during a DAG run. The code is [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/main.py) and [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/create_energistics_manifest.py). And this code looks messy and outdated, and requires some refactoring.
### What should be done?
1. Update the code to make it work with the most recent `osdu-*` Python libs. The dependencies are here https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/build/requirements.txt
2. Delete deprecated functionality of processing files by `preload_file_path` [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/blob/master/energistics/src/witsml_parser/energistics/libs/create_energistics_manifest.py#L314).
3. Add the static-analysis step in the CI/CD.
4. Add possibility to pass the user's access/id token to the DAG
5. Common refactoring, because the code is messy now (a lot of "ifs" and lines of code in a single function)M17 - Release 0.20Vadzim Kulybaharshit aggarwalWalter Detienne peyssonMarc Burnie [AWS]Vadzim Kulybahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/eds-dms/-/issues/5EDS DMS - Schema changes2023-02-16T19:05:10ZJeyakumar DevarajuluEDS DMS - Schema changesSome of the attributes in ConnectedSourceRegistryEntry and ConnectedSourceDataJob Schema have been changed to adhere to the OSDU naming convention and standard before the M13 release.
EDS DMS uses SecuritySchemes to connect to external ...Some of the attributes in ConnectedSourceRegistryEntry and ConnectedSourceDataJob Schema have been changed to adhere to the OSDU naming convention and standard before the M13 release.
EDS DMS uses SecuritySchemes to connect to external system.
Below are few changes in ConnectedSourceRegistrySchema attributes. If any of the below are attributes are used in eds dms data model, please change it.
Attribute ClientIdKeyName will have Azurevalue Key, Using the key and the value should be fetched from Azure Key vault using secret service. Already secret service implementation was part EDS DMS code
Note: If the attributes contain KeyName, then it will be stored in Azure Key Vault and it should be fetched using Secret Service EX: ClientSecretKeyName
| Old Name | New Name |
| ------ | -------- |
| Type | TypeID |
| FlowType | FlowTypeID |
| callbackUrl | CallbackUrl |
| authorizationUrl | AuthorizationUrl |
| ScopesKey | ScopesKeyName |
| ClientSecretKey | ClientSecretKeyName |
| ClientID | ClientIDKeyName |
| RefreshTokenKey | RefreshTokenKeyName |
| AccessTokenKey | AccessTokenKeyName |
| APIKeyKey | APIKeyKeyName |
| UsernameKey | UsernameKeyName |
| PasswordKey | PasswordKeyName |M15 - Release 0.18Thulasi Dass SubramanianThulasi Dass Subramanianhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/issues/5Adding headers to Put the file on Dataset Service2023-02-13T08:59:19ZJayesh BagulAdding headers to Put the file on Dataset Service**Context:**
This issue goal is to put the Azure manifest json files into storage service. Currently header for this request is not considered in [**osdu-airflow-lib**](https://community.opengroup.org/osdu/platform/data-flow/ingestion/o...**Context:**
This issue goal is to put the Azure manifest json files into storage service. Currently header for this request is not considered in [**osdu-airflow-lib**](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib) The call from [`_put_file_on_dataset_service`](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/blob/master/osdu_airflow/operators/mixins/ReceivingContextMixin.py#L226) which calls the [**osdu_api**](https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/blob/master/osdu_api/clients/base_client.py#L194) is making the ingestion process fail.
```
<Error>
<Code>MissingRequiredHeader</Code>
<Message>An HTTP header that's mandatory for this request is not specified.
RequestId:bdf6dae2-701e-004f-2529-367fcb000000
Time:2023-02-01T10:37:20.2256774Z</Message>
<HeaderName>x-ms-blob-type</HeaderName>
</Error>
```
The "x-ms-blob-type" header is used in the Azure Blob storage service to specify the type of blob that is being uploaded. It ensures that the correct type of blob is being uploaded.
The accepted values for the "x-ms-blob-type" header is "BlockBlob".
Call from _osdu-airflow-lib_:
`put_result = dataset_dms_client.make_request(method=HttpMethod.PUT, url=signed_url, data=file_content, no_auth=True)`
**Proposal:**
• The dataset service should be able to pass the headers while calling OSDU_API
cc: @Srinivasan_Narayanan @chad @valentin.gauthier @Yan_Sushchynski @nursheikhM16 - Release 0.19Jayesh BagulJayesh Bagul