OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2023-05-11T05:56:21Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/26EDS M17 Features and Fixes details2023-05-11T05:56:21ZPriyanka BhongadeEDS M17 Features and Fixes detailsThe significant features and fixes of EDS M17 are listed below:
Features:
1. PasswordCredentials OAuth Flow Type has been introduced, which allows EDS M17 to generate an access token for data providers using this flow type for authoriz...The significant features and fixes of EDS M17 are listed below:
Features:
1. PasswordCredentials OAuth Flow Type has been introduced, which allows EDS M17 to generate an access token for data providers using this flow type for authorization. To generate the access token, the parameters required are username, password, client ID, client secret, and scopes. The "FlowTypeID": "{{data_partition_id}}:reference-data--OAuth2FlowType:PasswordCredentials:" is added to the ConnectedSourceRegistryEntry record.https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/extern-data/home/-/issues/267
2. EDS M17 now validates the expiry of the refresh token and auto-generates a new refresh token while updating the secret vault. If the refresh token value in the secret vault is expired, the eds_ingest fails to generate an access token, and the run fails. To handle this situation, eds_ingest verifies if the refresh token is expired and generates a new refresh token value following PasswordCredentials authentication grant type. The secret service then accesses the new refresh token value to update the old/expired value with the newly generated refresh token value. The "FlowTypeID": "{{data_partition_id}}:reference-data--OAuth2FlowType:RefreshTokenKeyName:" is added to the ConnectedSourceRegistryEntry (CSRE). The data provider for this feature is Katalyst. https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/19
3. Parent data mapping is now handled in EDS M17, which includes keeping the source identifier ("id" of the parent data) in NameAlias of the parent record during ingestion into the operator environment. This helps the operator to find the source of each record and group them. When ingesting child data (e.g., Well log data) into the target environment, the child data is tagged to the right master data (e.g., Wellbore) in the target environment, and there is no name mismatch. This feature helps to identify a unique well using external rules between the external source and the target environment. https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/extern-data/home/-/issues/268
Fixes:
A logger has been added to detail the Osdu_ingest run id and the sample-fetched data record. The message displayed in eds_ingest Airflow Logs includes Osdu_ingest Run Id and one Sample data fetched from the data provider with the text "Displaying only one Sample Record." https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/23
The conversion of ConnectedSourceDataPartitionID to OnIngestionDataPartitionID for Array Datatype has been fixed. While ingestion, ConnectedSourceDataPartitionID (provider’s data partition id) is replaced with the OnIngestionDataPartitionID (operator’s data partition id) for all the parameters of the record with different datatypes (arrays, dicts). Each conversion is handled differently based on its datatype. For example, the conversion of string parameters from 'ResourceHomeRegionID': 'osdu:reference-data--OSDURegion:AWSEastUSA:' to 'ResourceHomeRegionID': 'opendes:reference-data--OSDURegion:AWSEastUSA:' is done similarly to the conversion of the array datatype. https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/extern-data/home/-/issues/261
The Dynamic Schema Authority for Kind of CSRE, CSDJ, and ExternalReferenceValueMapping is now added from Airflow Variable. The constant file has Kind of few eds dependent schemas, such as ConnectedSourceRegistryEntry, ConnectedSourceDataJob, and ExternalReferenceValueMapping. The Schema_Authority value was static in the Kind, which is now replaced with the Schema_authority value fetched from the Airflow Variable. https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/22
EDS now raises an exception when Airflow Variable is not found or None. Eds_ingest fails with KeyError if any of the important Airflow variable values are missing. https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/21M17 - Release 0.20Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/25EDS - Include Password Credentials OAuth Flow Type2023-05-04T08:33:42ZPriyanka BhongadeEDS - Include Password Credentials OAuth Flow Type- [x] Identify the changes
- [x] create a function/POC to handle Password Credentials OAuth Flow Type
- [x] create a unit test case
- [x] Test the functionality
- [x] code review- [x] Identify the changes
- [x] create a function/POC to handle Password Credentials OAuth Flow Type
- [x] create a unit test case
- [x] Test the functionality
- [x] code reviewM17 - Release 0.20Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/24EDS - Display Osdu_ingest run ID in eds_ingest Xcom Summary2023-05-23T08:14:18ZPriyanka BhongadeEDS - Display Osdu_ingest run ID in eds_ingest Xcom SummaryM18 - Release 0.21Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/pre-shipping/-/issues/491EDS - M17 feature and fixes Testing on Azure Preship with detailed observatio...2023-06-13T10:59:51ZPriyanka BhongadeEDS - M17 feature and fixes Testing on Azure Preship with detailed observation and findings- [x] eds_scheduler run details after changes in constant file and including airflow utility
- eds_scheduler dag link : https://osdu-ship.msft-osdu-test.org/airflow2/graph?dag_id=eds_scheduler
- [x] Logger for eds_ingest - Display...- [x] eds_scheduler run details after changes in constant file and including airflow utility
- eds_scheduler dag link : https://osdu-ship.msft-osdu-test.org/airflow2/graph?dag_id=eds_scheduler
- [x] Logger for eds_ingest - Display message for sample record - https://github.com/ExxonMobil/osdu-platform/issues/318
- [x] Logger for eds_ingest - Display Osdu_ingest run ID explicitly in logs - https://github.com/ExxonMobil/osdu-platform/issues/318
- eds_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T07%3A59%3A56%2B00%3A00
- ![image](/uploads/bcf7d5da196ebef814667114a3bc337d/image.png)
- [x] Password Credential OAuth flow type - EDS - Include Password Credentials OAuth Flow Type #303
- eds_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T11%3A45%3A47%2B00%3A00
CSRE : opendes:master-data--ConnectedSourceRegistryEntry:KatalystTestingM17
![image](/uploads/7178379b3df27dfc0b32f2e411503bcb/image.png)
![image](/uploads/f56a814c55a2c69300f916f1f46eba64/image.png)
- [x] Validate the expiry of refresh token and generate a new refresh token and update secret vault - https://github.com/ExxonMobil/osdu-platform/issues/324
- eds_ingest dag link :
- [x] EDS - Adding Dynamic Schema Authority for Kind of CSRE ,CSDJ and ExternalReferenceValueMapping from Airflow Variable - https://github.com/ExxonMobil/osdu-platform/issues/360
- [x] EDS - Raise exception when Airflow Variable not found or None - https://github.com/ExxonMobil/osdu-platform/issues/359
- eds_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T08%3A10%3A55%2B00%3A00
-
![image](/uploads/0e48aa06635e320ea51b9b130bd28fa6/image.png)
- [ ] EDS: Conversion of the ConnectedSourceDataPartitionID to OnIngestionDataPartitionID for Array Datatype - https://github.com/ExxonMobil/osdu-platform/issues/322
- OSDU Ingest Link:
- [x] Test on Non-OSDU compliant Provider for master and WPC data
- Master Data -- SeismicAcquisitionSurvey
- eds_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T11%3A54%3A02%2B00%3A00
- Osdu_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=Osdu_ingest&task_id=provide_manifest_integrity_task&execution_date=2023-05-02T11%3A54%3A15.263543%2B00%3A00
- Work Product Component -- issue with Wrapper - unable to provide data as "()" in the query
- eds_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/log?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T12%3A00%3A21%2B00%3A00
- Osdu_ingest dag link :
- [x] Test on OSDU Compliant Provider for master and WPC data - AWS M16
- Master Data -- Well
- eds ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T07%3A59%3A56%2B00%3A00
- Osdu_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=Osdu_ingest&task_id=process_single_manifest_file_task&execution_date=2023-05-02T08%3A00%3A28.724429%2B00%3A00
- Work Product Component Data -- WellLog
- eds ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=eds_ingest&task_id=fetch_client&execution_date=2023-05-02T08%3A06%3A35%2B00%3A00
- Osdu_ingest dag link : https://osdu-ship.msft-osdu-test.org/airflow2/xcom?dag_id=Osdu_ingest&task_id=process_single_manifest_file_task&execution_date=2023-05-02T08%3A06%3A48.542915%2B00%3A00
- [ ] Ingest any record and it will be added with "NameAliases" like below which has the source record id - https://github.com/ExxonMobil/osdu-platform/issues/289
"NameAliases": [
{
"AliasName": "odesprod:master-data--Basin:Kam2_02",
"AliasNameTypeID": ":reference-data--AliasNameType:EDSConnectedSourceIdentifier:",
"DefinitionOrganisationID": null
}
]
*Removed data-partition-id to avoid osdu_ingestion error
- [ ] Create provider specific External Reference Value Mapping - https://github.com/ExxonMobil/osdu-platform/issues/376https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/484M17 AWS - WITSML Parser - Trajectory data type - failure at schema validation...2023-05-02T21:04:01ZDebasis ChatterjeeM17 AWS - WITSML Parser - Trajectory data type - failure at schema validation stage[M17-AWS-WITSML-Trajectory-steps-and-data-Debasis.zip](/uploads/59370f20510a3deae8f2dd1398e39e3d/M17-AWS-WITSML-Trajectory-steps-and-data-Debasis.zip)
Please note that this is different problem than what we experienced earlier with work...[M17-AWS-WITSML-Trajectory-steps-and-data-Debasis.zip](/uploads/59370f20510a3deae8f2dd1398e39e3d/M17-AWS-WITSML-Trajectory-steps-and-data-Debasis.zip)
Please note that this is different problem than what we experienced earlier with work-product schema 1.1.0.
This time, it fails in schema validation stage. for Trajectory station type.https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/issues/92Update Policy service to support latest python2024-03-15T15:40:16ZShane HutchinsUpdate Policy service to support latest pythonPolicy Service requires Python 3.9.x.
Update Policy service to use a more recent version of Python.
Created from https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/issues/91Policy Service requires Python 3.9.x.
Update Policy service to use a more recent version of Python.
Created from https://community.opengroup.org/osdu/platform/security-and-compliance/policy/-/issues/91M24 - Release 0.27Shane HutchinsShane Hutchinshttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/reservoir/open-etp-server/-/issues/54Add challenges response used when the client does not have a token2023-04-27T13:34:39ZFabiola RiveraAdd challenges response used when the client does not have a tokenNeed to add the AuthorizationDetails endpoint capability which has the information the client needs to find the authorization server and get a bearer token.
Also add the challenges field to AuthorizeResponse (which is the same informat...Need to add the AuthorizationDetails endpoint capability which has the information the client needs to find the authorization server and get a bearer token.
Also add the challenges field to AuthorizeResponse (which is the same information that is specified in an endpoint's AuthorizationDetails endpoint capability).
Can add a new environment variable with the bearer auth param.
See ETP specs for details.https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/23EDS - Adding more description to logger in eds_ingest2023-05-04T10:42:00ZPriyanka BhongadeEDS - Adding more description to logger in eds_ingest1. Include status code in logger after POST and GET Request
2. include description to logger to understand flow of eds_ingest
3. Include CSRE and CSDJ IDs in logger1. Include status code in logger after POST and GET Request
2. include description to logger to understand flow of eds_ingest
3. Include CSRE and CSDJ IDs in loggerM18 - Release 0.21Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/data/open-test-data/-/issues/90Add all released new schema versions to test data2023-05-22T21:07:39ZMichaelAdd all released new schema versions to test dataNew schemas are released (for instance master-data--Wellbore:1.3.0), however, these new schemas are not added to the pre-shipping environments for testing.
There should at least be a few records that use the latest released schemas for ...New schemas are released (for instance master-data--Wellbore:1.3.0), however, these new schemas are not added to the pre-shipping environments for testing.
There should at least be a few records that use the latest released schemas for each major data type (aster-data--Well, master-data--Wellbore, wpc--SeismicTraceData, wpc--SeismicHorizon, wpc--WellLog, etc) available in the pre-shipping environments by addition to the test dataset.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/81ADR: Configurable Index Extensions and De-Normalizations2024-02-14T18:00:03ZThomas Gehrmann [slb]ADR: Configurable Index Extensions and De-Normalizations<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed...<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed to ADR: User-friendly/App-friendly
Index Schemas
in [Enterprise Architecture ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66)
<details>
<summary markdown="span">Preparation Material</summary>
OSDU Data Definitions conducted a number of sessions in the Core Concepts meetings, which contain supplementary
information:
**2022**
1. [Meeting Minutes 2022-07-05](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-05-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-de-normalizations)
2. [Meeting Minutes 2022-07-12](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-12-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
3. [Meeting Minutes 2022-07-19](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-19-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
4. [Meeting Minutes 2022-07-26](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-26-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-aka-index-schemas)
**2023**
1. [Meeting Minutes 2023-03-21](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-21-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-adr-66-configuration)
2. [Meeting Minutes 2023-03-28](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-28-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-configuration-mechanics-schema-review)
3. [Enterprise Architecture Advice Forum 2023-04-12](https://opensdu.slack.com/archives/C04TPV9CRUP/p1681291140407219?thread_ts=1681217870.084929&cid=C04TPV9CRUP)
</details>
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Context & Scope
The entity type schemas delivered by the OSDU Data definitions subcommittee pose a number of challenges
for consumers. Most of them are due to the normalization of schemas and the friendliness to ingestors, which allows
storage of values as is and less standardized. The main problem is the usage of arrays of objects, which are difficult
when forming queries and cause costs for indexing. So far the issues have been mitigated by decorating arrays of objects
with `x-osdu-indexing` instructions. An umbrella issue has been recorded in
[community DD issue #30](https://community.opengroup.org/osdu/data/data-definitions/-/issues/30), which collects a
numer of more detailed requests.
In previous OSDU prototypes, this was addressed by specific workarounds,
see [OSDU R1 Indexing Approach and Specification](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/wikis/uploads/46b4f84f0903cc385abd147a0175a00a/r1_indexing.pdf).
Here an attempt to classify the workarounds listed in the R1 document above:
1. Extraction of standardized values from arrays of objects using conditions (e.g., Well UWI, SpudDate).
2. Chasing relationships to parent or related objects in order to de-normalize parent/related object values on children.
3. Offering related object's Name/Code for presentations in applications.
4. Counting children of well-known kinds. (The priority of this is lower compared to 1 and 2. The current Search service
should be capable of performing querying a particular parent-child relationship.)
The current methods using `x-osdu-virtual-properties`, `x-osdu-is-derived` and `x-osdu-indexing` JSON schema decorations
fall short when the query conditions become dependent on platform operators usage of, e.g., reference values. In many
cases the reference value lists shipped by OSDU are incomplete or not clearly enough documented to guide global platform
standards.
[Back to TOC](#TOC)
---
## Requirements
* We need a configurable way to define rules for property extraction, either from nested arrays of objects or from
related objects.
* We need OSDU provided standard index schema extensions to extend the entity types schemas with extracted values. (
Governance for interoperability)
* We need to open the index schema extensions to applications and services to optimize frequently used query patterns.
One of them is the look-up of names or codes of related objects where the source record holds the target record id.
* We need a platform embedded service, which performs the extractions and de-normalizations on demand (data
creation/update events)
* we need platform support to refresh indexes if the indexing schemas change (both for OSDU and application indexing
schemas).
[Back to TOC](#TOC)
---
# Tradeoff Analysis
The original tradeoff analysis was performed and recorded
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66).
The need for performance required further simplification.
* Replicating derived/de-normalized property values in Storage records was discarded as this would create an enormous
stack of versions for each individual record as records would need to be updated if properties derived from parents or
children changed.
* Instead, de-normalization could happen exclusively in the indexer, simultaneously exploiting the already indexed
values of parent and children records. (Preferred option)
* Using configurable index extension rules was already proposed
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66). The
proposed additional index schemas with references to configurations were discarded. All required information can be
encoded in the configurations themselves. Any index extension schema fragments and documentation can be auto-generated
from the configurations.
* Interoperability is achieved by firm governance rules - the configurations are stored and customizable as OPEN
governance reference-data. However, additional governance rules have to be provided to keep interoperability
guaranteed across deployments and to prevent unwanted interference of index extensions with actual schema properties.
[Back to TOC](#TOC)
---
# Solution
## Index Extension, Data Definition
OSDU Standard index extensions are defined by OSDU Data Definition work-streams with the intent to provide
user/application friendly, derived properties. The standard set, together with the OSDU schemas, form the
interoperability foundation. They can contribute to deliver domain specific APIs according to the Domain Driven Design
principles.
The configurations are encoded in OSDU reference-data records, one per each major schema version. The proposed type name
is IndexPropertyPathConfiguration. The diagram below shows the decomposition into parts.
![IndexPropertyPathConfiguration](/uploads/7f1330dd7a41903a90174feb7fe2c9d9/IndexPropertyPathConfiguration.png)
* One IndexPropertyPathConfiguration record corresponds to one schema kind's major version, i.e., the
IndexPropertyPathConfiguration record id for all the `schema osdu:wks:master-data--Wellbore:1.*.*` kinds is set
to `partition-id:reference-data--IndexPropertyPathConfiguration:osdu:wks:master-data--Wellbore:1`. Code, Name and
Descriptions are filled with meaningful data as usual for all reference-data types.
* The additional index properties are added with one JSON object each in the `Configurations[]` array. The Name defined
the name of the index 'column', or the name of the property one can search for. The Policy decides, in the current
usage, whether the resulting value is a single value or an array containing the aggregated, derived values.
* Each `Configurations[]` element has at least one element defined in `Paths[]`.
* The `ValueExtraction` object has one mandatory property, `ValuePath`. The other optional two properties hold value
match conditions, i.e., the property containing the value to be matched and the value to match.
* If no `RelatedObjectsSpec` is present, the value is derived from the object being indexed.
* If `RelatedObjectsSpec` is provided, the value extraction is carried out in related objects - depending on
the `RelationshipDirection` indirection parent/related object or children. The property holding the record id to
follow is specified in `RelatedObjectID`, so is the expected target kind. As in `ValueExtraction`, the selection can
be filtered by a match condition (`RelatedConditionProperty` and `RelatedConditionMatches`)
With this, the extension properties can be defined as if they were provided by a schema.
Most of the use cases deal with text (string) types. The definition of configurations is however not limited to string
types. As long as the property is known to the indexer, i.e., the source record schema is describing the types, the type
can be inferred by the indexer. This does not work for nested arrays of objects, which have not been indexed
with `"x-osdu-indexing": {"type":"nested"}`. In this case the types unknown to teh Indexer Service are
string-serialized; the resulting index type is then of type `string`, still supporting text search.
[Back to TOC](#TOC)
---
### Use Case 1, WellUWI
_As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am
able to specify a prioritized AliasNameType list to look up value in the NameAliases array._
The configuration demonstrates extractions from the record being indexed itself. With Policy `ExtractFirstMatch`, the
first value matching the condition `RelatedConditionProperty` is equal to one of `RelatedConditionMatches`.
<details><summary>Configuration for Well, extract WellUWI from NameAliases[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"ValueExtraction": {
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--AliasNameType:UniqueIdentifier:",
"{{data-partition-id}}:reference-data--AliasNameType:RegulatoryName:",
"{{data-partition-id}}:reference-data--AliasNameType:PreferredName:",
"{{data-partition-id}}:reference-data--AliasNameType:CommonName:"
],
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"ValuePath": "data.NameAliases[].AliasName"
}
}
],
"UseCase": "As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am able to specify a prioritized AliasNameType list to look up value in the NameAliases array."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 2, CountryNames
_As a user I want to find objects by a country name, with the understanding that an object may extend over country
boundaries._
This configuration demonstrates the extraction from related index objects - here `RelatedObjectKind`
being `osdu:wks:master-data--GeoPoliticalEntity:1.`, which are found via `RelatedObjectID` as
in `data.GeoContexts[].GeoPoliticalEntityID`. The condition is constrained to be that GeoTypeID is
GeoPoliticalEntityType:Country.
<details><summary>Configuration for Well, extract CountryNames from GeoContexts[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "CountryNames",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--GeoPoliticalEntityType:Country:"
],
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
],
"UseCase": "As a user I want to find objects by a country name, with the understanding that an object may extend over country boundaries."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 3, Wellbore Name on WellLog Children
_As a user I want to discover WellLog instances by the wellbore's name value._
A variant of this can be WellUWI from parent Wellbore → Well; in that case the value would be derived from the
already extended index values.
This configuration demonstrates extractions from multiple `Paths[]`.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellboreName",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.VirtualProperties.DefaultName"
}
},
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.FacilityName"
}
}
],
"UseCase": "As a user I want to discover WellLog instances by the wellbore's name value."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 4, Wellbore index WellLogCurveMnemonics
_As a user I want to find Wellbores by well log mnemonics._
This configuration demonstrates the Policy `ExtractAllMatches` with related objects discovered by
RelationshipDirection `ParentToChildren`, i.e., related objects referring the indexed record.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellLogCurveMnemonics",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelationshipDirection": "ParentToChildren",
"RelatedObjectID": "WellboreID",
"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."
},
"ValueExtraction": {
"ValuePath": "Curves[].Mnemonic"
}
}
],
"UseCase": "As a user I want to find Wellbores by well log mnemonics."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
## Index Extension, Governance
OSDU Data Definition ships reference value list content for all reference-data group-type entities. The type
IndexPropertyPathConfiguration is classified as OPEN governance, which usually means that new records can be added by
platform operators. This rule must be adjusted for IndexPropertyPathConfiguration records.
### Permitted Changes to IndexPropertyPathConfiguration Records
It is permitted to
* customize the conditions for value extractions, notable the matching values in `RelatedConditionMatches`.
* add additional `Paths[]` elements to `Configurations[].Paths[]`
* add new index property configuration objects to the `Configurations[]` array. To avoid interference with future OSDU
updates it is strongly recommended to add a namespace prefix to the Configurations[].Name, e.g., "OperatorX.WellUWI".
### Prohibited Changes to IndexPropertyPathConfiguration Records
It is not permitted to
* change the target value type of existing, OSDU shipped index extensions. Example the `ExtractionPath` to a string
property in the original OSDU `Configurations[].ValueExtraction.ValuePath` must not be altered to a number, integer,
or array.
* change the meaning of existing, OSDU shipped index extensions.
* remove OSDU shipped extension definitions in Configurations[].
[Back to TOC](#TOC)
---
## Consumption by Indexer Service
### Recursive Index Updates
With the introduction of de-normalizations record updates can cause infinite recursions. The implementation needs to
address this and avoid situations like in the following diagram:
![Recursions](/uploads/020675583cb7b65560f0d73ffe08fc3c/Recursions.png)
On the left hand Storage records are updated to new versions, which trigger indexing. The update of the index triggers
the index update of related index records due to the derived property values (as defined in the `RelatedObjectsSpec`).
These updates may, in turn, cause a recursion. This must not happen.
The augmenter introduces a new attribute `ancestry_kinds` in the Attributes map of the message payload when sending
messages to update the index of parent/children records. The value of `ancestry_kinds` attribute can include multiple
kinds separated by comma. This new attribute is used to prevent infinite loop of the index chasing. The indexer-queue
must pass the attribute back to the indexer when it receives indexing messages.
### Pseudo-Code
1. For each record to be indexed (create/update event from Storage service):
* Has the record kind a IndexPropertyPathConfiguration?
* Yes
* get or create the internal index schema that combines the schema of the record kind and schema of extended
properties
* create index document that combines the properties of original record and extended properties
* call ElasticSearch service to create or update the index of the record with extended properties
* No
* **_No action_** (=default for records without IndexPropertyPathConfiguration)
2. Re-Indexing (create/update event from Storage service for a IndexPropertyPathConfiguration record)<br>
To update the schema (or say template) of the kind in ElasticSearch when the kind is re-indexed:
* create the internal index schema derived from the kind (as registered in the Schema service)
* create the internal index schema derived from IndexPropertyPathConfiguration
* merge the internal index schemas
* convert the schema to ElasticSearch template
* call ElasticSearch service to update the index template (schema)
[Back to TOC](#TOC)
---
## Accepted Limitations
* A change in the configurations requires re-indexing of all the records of a major schema version kind. It is the same
limitation as an in-place schema change for any kind.
* All the extensions defined in the IndexPropertyPathConfiguration records refer to properties in the `data` block,
including `ValuePath`, `RelatedObjectID`, `RelatedConditionProperty`.
* Only properties in the `data` block of records being indexed can be reached by the `ValuePath`; system properties are
out of reach. The prefix `data.` is therefore optional and can be omitted.
* The formats/values of the extended properties are extracted from the formats/values of the related index records. If
the formats of the original properties are unknown in the related index records, the indexer will set the value type
of the extended properties as string or string array. (With additional complexity and schema parsing, this limitation
can be overcome, but currently the added value seems to be marginal.)
* If the extended properties are extracted from arrays of objects indexed with
(`"x-osdu-indexing": {"type":"flattened"}`), the indexer cannot re-construct the object properties to the
nested objects when the policy `ExtractAllMatches` is applied. (The kind of indexing is already a deliberate choice.
With additional complexity, this limitation can be overcome, but currently the added value seems to
be marginal.)
* To simplify the solution, all the related kinds defined in the configuration are kinds with major version only. They
must end with dot ".". For example: `"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."`.
* Index updates may take time. Immediate consistency cannot be expected.
* When a kind derives extended properties from its parent(s), a new data property `data.AssociatedIdentities` is added
on demand by the indexer. The property name `AssociatedIdentities` is therefore reserved by the Indexer and shall not
be used in any OSDU schemas.
Currently, the property name `AssociatedIdentities` is not in use in any of the OSDU well-known schemas. Tests will be
implemented in the OSDU Data Definition pipeline to ensure that this reserved name does not appear as property in
the `data` block.
[Back to TOC](#TOC)
---
# Change Management
1. Configurations are reference-data and need to be ingested/updated.
2. OSDU Data Definitions must take on the task of defining IndexPropertyPathConfiguration records.
3. Updates (extensions) of index extensions must be managed carefully as they cause re-indexing the kinds involved.
# Decision
# Consequences
* The indexer code changes should have no impact on the system if no IndexPropertyPathConfiguration records are present.
[Back to TOC](#TOC)
---
# ADR Comments BelowM18 - Release 0.21https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/261objectId filed is not present2023-04-26T13:48:46ZDmytro KomisarobjectId filed is not presentHere [README](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L40) says
```bash
az ad sp list --display-name ...Here [README](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L40) says
```bash
az ad sp list --display-name $NAME --query [].objectId -ojson
```
but output json does not have ".objectId" filed. Assume just ".id" is what is needed but it definitely need to be corrected.
Also, [line 48](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L48) says:
```bash
az ad app permission admin-consent --id $appId
```
where $appId was not set. Again, I assume this should be "appId" from line 22, but not sure about this.
Could these please be fixed.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/99Include aws region in dataset information for AWS Seismic DDMS data2024-02-26T21:52:49ZMichaelInclude aws region in dataset information for AWS Seismic DDMS dataWhen using sdapi to retreive seismic ddms data coming from AWS, a user needs to first set the AWS_REGION environment variable (see ticket https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/s...When using sdapi to retreive seismic ddms data coming from AWS, a user needs to first set the AWS_REGION environment variable (see ticket https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-cpp-lib/-/issues/21).
To better handle this use case, the get dataset service `/dataset/tenant/{tenantid}/subproject/{subproject}/dataset/{datasetid}` should provide information regarding the aws region if the dataset is stored in s3 storage.https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/issues/1Fix Pytype check for the service2023-04-10T14:35:08ZYan Sushchynski (EPAM)Fix Pytype check for the serviceJob [#1863572](https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/jobs/1863572) failed for 041fa197aa9cf3276f4131894497fb8f53c7ffd1:Job [#1863572](https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/jobs/1863572) failed for 041fa197aa9cf3276f4131894497fb8f53c7ffd1:Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/issues/24Show Openzgy library version number - as part of DAG name or some such suitab...2023-04-15T13:54:27ZDebasis ChatterjeeShow Openzgy library version number - as part of DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_Wallhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/issues/18Capture OpenVDS library version number in DAG name or some such suitable place2024-03-26T11:21:36ZDebasis ChatterjeeCapture OpenVDS library version number in DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallDeepa KumariDeepa Kumarihttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/eds-dms/-/issues/13Input validation on the API2023-04-06T20:25:50ZOkoun-Ola Fabien HouetoInput validation on the APIWe need a clear documentation of the approach for input validation. While there may not be a documentation or guideline at the forum level, EDS could document its approach to input validation. See https://community.opengroup.org/osdu/pla...We need a clear documentation of the approach for input validation. While there may not be a documentation or guideline at the forum level, EDS could document its approach to input validation. See https://community.opengroup.org/osdu/platform/system/storage/-/issues/51#note_39725 and https://community.opengroup.org/osdu/platform/security-and-compliance/home/-/issues/95#note_149265https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/98Pagination not supported by IBM and AWS for DATASET LIST (POST) endpoint2023-04-05T14:28:58ZPratiksha ShedgePagination not supported by IBM and AWS for DATASET LIST (POST) endpointA new API has been added as DATASET LIST (POST) endpoint which supports pagination. This API should return the list of datasets and nextPageCursor to get the next list of datasets. However, IBM and AWS do not support pagination for this ...A new API has been added as DATASET LIST (POST) endpoint which supports pagination. This API should return the list of datasets and nextPageCursor to get the next list of datasets. However, IBM and AWS do not support pagination for this endpoint, which causes the pagination tests to fail during pipeline runs.
Pipeline runs:
IBM-https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/1823012
AWS-https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/1842803https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/152APIs to get the XCOM summary (Entries) are working in AWS environment, but ar...2023-11-09T07:43:12ZKamlesh TodaiAPIs to get the XCOM summary (Entries) are working in AWS environment, but are NOT working in other CSPs (Azure, GC and IBM) environmentsThe APIs to get the xcomEntries using the the runid and the taskinstance are working in AWS environment. The endpoints/API are not implemented/deployed in other CSP's (Azure, GC, IBM) environments.
<details><summary>curl --location 'htt...The APIs to get the xcomEntries using the the runid and the taskinstance are working in AWS environment. The endpoints/API are not implemented/deployed in other CSP's (Azure, GC, IBM) environments.
<details><summary>curl --location 'https://r3m16.forumtesting.osdu.aws/api/airflow/api/v1/dags/Osdu_ingest/dagRuns/45eb9f45-aada-4e2c-b618-818fb5dfcf28/taskInstances/process_single_manifest_file_task/**xcomEntries/record_ids**' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer eyJraWQiOi...fWbOUA3RcQ'</summary>
</details>
Response 200 OK
{
"dag_id": "Osdu_ingest",
"execution_date": "2023-04-04T21:19:27.327451+00:00",
"key": "record_ids",
"task_id": "process_single_manifest_file_task",
"timestamp": "2023-04-04T21:19:48.761929+00:00",
"value": "['osdu:reference-data--FacilityType:WELL_999259423605', 'osdu:master-data--Organisation:Auto_Test_999259423605', 'osdu:reference-data--FacilityEventType:SPUD_DATE_999259423605', 'osdu:reference-data--VerticalMeasurementPath:DEPTH_DATUM_ELEV_999259423605', 'osdu:reference-data--AliasNameType:WELL_NAME_999259423605', 'osdu:master-data--Well:999259423605']"
}
=========================================================================================================================
`curl --location 'https://r3m16.forumtesting.osdu.aws/api/airflow/api/v1/dags/Osdu_ingest/dagRuns/45eb9f45-aada-4e2c-b618-818fb5dfcf28/taskInstances/process_single_manifest_file_task/**xcomEntries/skipped_ids**' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer eyJraWQiOi...fWbOUA3RcQ`
Response 200 OK
{
"dag_id": "Osdu_ingest",
"execution_date": "2023-04-04T21:19:27.327451+00:00",
"key": "skipped_ids",
"task_id": "process_single_manifest_file_task",
"timestamp": "2023-04-04T21:19:48.783236+00:00",
"value": "[]"
}
@chad @debasisc @Srinivasan_Narayanan @dzmitry_malkevich @anujguptahttps://community.opengroup.org/osdu/platform/system/storage/-/issues/171Metadata only updates (via PATCH api) creates a mismatch in modifyUser and mo...2023-07-05T09:50:37ZAlok JoshiMetadata only updates (via PATCH api) creates a mismatch in modifyUser and modifyTime fields between record metadata and record data[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUse...[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUser fields for metadata and data objects respectively.
Repro steps:
- Create a storage record
- Modify the metadata ACL with PATCH api
- Retrieve the record with Storage records:batch api or getRecord api
- modifyTime and modifyUser fields are not returned.
OR
- Create a storage record
- Update the same record with PUT api
- Modify the metadata ACL with PATCH api
- Retrieve the record
- modifyTime and modifyUser are returned but not correct
Expected: From a user's perspective, when they update a record (either metadata or data or both), they should get back modifyUser and modifyTime values appropriatelyhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-sdutil/-/issues/26[Azure R3M16] sdutil : Segy file appears to be present in SD-STORE, but the f...2023-04-04T11:19:24Zkenneth liew[Azure R3M16] sdutil : Segy file appears to be present in SD-STORE, but the file transmission was unsuccessful. I had failed to transfer my SEGY file to sdutil storage since I put the wrong local file path, but the file was created on sdutil storage.
I had run the commands "sdutil stat" and "sdutil cp" for your reference.
Below is my Python comm...I had failed to transfer my SEGY file to sdutil storage since I put the wrong local file path, but the file was created on sdutil storage.
I had run the commands "sdutil stat" and "sdutil cp" for your reference.
Below is my Python command for your reference.
```
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil auth login
Successfully logged into Azure SDUTIL.
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil cp C:\Users\kuanl\Desktop\SegY\SampleSegy\UP000000001__UP123456__TST-SEGY-UPLOAD-TST__1000022.sgy sd://opendes/kennethv3/TestFailed2.sgy
Wrong Command: C:\Users\kuanl\Desktop\SegY\SampleSegy\UP000000001__UP123456__TST-SEGY-UPLOAD-TST__1000022.sgy is not a valid local file name or the local file does not exist.
For more information type "python sdutil cp" to open the command help menu.
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil ls sd://opendes/kennethv3
SegyTest.segy
Seismic_data.segy
TestFailed.sgy
TestFailed1.sgy
TestFailed2.sgy
UP000000001__UP123456__TST-SEGY-UPLOAD-TST__100001.sgy
UP000000001__UP123456__TST-SEGY-UPLOAD-TST__100001.sgy.sgy
UP000000002__UP123456__TST-SEGY-UPLOAD-TST__100002.sgy
UP000000002__UP123456__TST-SEGY-UPLOAD-TST__10001.sgy
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil stat sd://opendes/kennethv3/TestFailed2.sgy
- Name: sd://opendes/kennethv3/TestFailed2.sgy
- Created By: 97pQgJtRFH99Y1KViwFV4GaADxKsIeRG9ZPJ-4PnMb0
- Created Date: Tue Apr 04 2023 09:07:48 GMT+0000 (Coordinated Universal Time)
- ReadOnly: False`
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil cp sd://opendes/kennethv3/TestFailed2.sgy C:\Users\kuanl\Desktop\SegY\SampleSegy\TestFailed2.sgy
[423] [seismic-store-service] opendes/kennethv3/TestFailed2.sgy is locked for write [RCODE:WL86400]
```