OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2023-05-04T10:42:00Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/23EDS - Adding more description to logger in eds_ingest2023-05-04T10:42:00ZPriyanka BhongadeEDS - Adding more description to logger in eds_ingest1. Include status code in logger after POST and GET Request
2. include description to logger to understand flow of eds_ingest
3. Include CSRE and CSDJ IDs in logger1. Include status code in logger after POST and GET Request
2. include description to logger to understand flow of eds_ingest
3. Include CSRE and CSDJ IDs in loggerM18 - Release 0.21Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/81ADR: Configurable Index Extensions and De-Normalizations2024-02-14T18:00:03ZThomas Gehrmann [slb]ADR: Configurable Index Extensions and De-Normalizations<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed...<a name="TOC"></a>
[[_TOC_]]
Originally recorded during June 28-30, 2022 F2F as "Hints replacements, multiple index schemas (participation of indexer
& data definition needs to be in charge), content vs catalog, side-car", then renamed to ADR: User-friendly/App-friendly
Index Schemas
in [Enterprise Architecture ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66)
<details>
<summary markdown="span">Preparation Material</summary>
OSDU Data Definitions conducted a number of sessions in the Core Concepts meetings, which contain supplementary
information:
**2022**
1. [Meeting Minutes 2022-07-05](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-05-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-de-normalizations)
2. [Meeting Minutes 2022-07-12](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-12-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
3. [Meeting Minutes 2022-07-19](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-19-DataDefinitionsCoreConcepts_MeetingMinutes.md#43-user-friendly-schemas-aka-index-schemas)
4. [Meeting Minutes 2022-07-26](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2022/2022-07-26-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-user-friendly-schemas-aka-index-schemas)
**2023**
1. [Meeting Minutes 2023-03-21](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-21-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-adr-66-configuration)
2. [Meeting Minutes 2023-03-28](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2023/2023-03-28-DataDefinitionsCoreConcepts_MeetingMinutes.md#42-index-extensions-configuration-mechanics-schema-review)
3. [Enterprise Architecture Advice Forum 2023-04-12](https://opensdu.slack.com/archives/C04TPV9CRUP/p1681291140407219?thread_ts=1681217870.084929&cid=C04TPV9CRUP)
</details>
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Context & Scope
The entity type schemas delivered by the OSDU Data definitions subcommittee pose a number of challenges
for consumers. Most of them are due to the normalization of schemas and the friendliness to ingestors, which allows
storage of values as is and less standardized. The main problem is the usage of arrays of objects, which are difficult
when forming queries and cause costs for indexing. So far the issues have been mitigated by decorating arrays of objects
with `x-osdu-indexing` instructions. An umbrella issue has been recorded in
[community DD issue #30](https://community.opengroup.org/osdu/data/data-definitions/-/issues/30), which collects a
numer of more detailed requests.
In previous OSDU prototypes, this was addressed by specific workarounds,
see [OSDU R1 Indexing Approach and Specification](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/wikis/uploads/46b4f84f0903cc385abd147a0175a00a/r1_indexing.pdf).
Here an attempt to classify the workarounds listed in the R1 document above:
1. Extraction of standardized values from arrays of objects using conditions (e.g., Well UWI, SpudDate).
2. Chasing relationships to parent or related objects in order to de-normalize parent/related object values on children.
3. Offering related object's Name/Code for presentations in applications.
4. Counting children of well-known kinds. (The priority of this is lower compared to 1 and 2. The current Search service
should be capable of performing querying a particular parent-child relationship.)
The current methods using `x-osdu-virtual-properties`, `x-osdu-is-derived` and `x-osdu-indexing` JSON schema decorations
fall short when the query conditions become dependent on platform operators usage of, e.g., reference values. In many
cases the reference value lists shipped by OSDU are incomplete or not clearly enough documented to guide global platform
standards.
[Back to TOC](#TOC)
---
## Requirements
* We need a configurable way to define rules for property extraction, either from nested arrays of objects or from
related objects.
* We need OSDU provided standard index schema extensions to extend the entity types schemas with extracted values. (
Governance for interoperability)
* We need to open the index schema extensions to applications and services to optimize frequently used query patterns.
One of them is the look-up of names or codes of related objects where the source record holds the target record id.
* We need a platform embedded service, which performs the extractions and de-normalizations on demand (data
creation/update events)
* we need platform support to refresh indexes if the indexing schemas change (both for OSDU and application indexing
schemas).
[Back to TOC](#TOC)
---
# Tradeoff Analysis
The original tradeoff analysis was performed and recorded
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66).
The need for performance required further simplification.
* Replicating derived/de-normalized property values in Storage records was discarded as this would create an enormous
stack of versions for each individual record as records would need to be updated if properties derived from parents or
children changed.
* Instead, de-normalization could happen exclusively in the indexer, simultaneously exploiting the already indexed
values of parent and children records. (Preferred option)
* Using configurable index extension rules was already proposed
in [EA ADR #66](https://gitlab.opengroup.org/osdu/subcommittees/ea/work-products/adr-elaboration/-/issues/66). The
proposed additional index schemas with references to configurations were discarded. All required information can be
encoded in the configurations themselves. Any index extension schema fragments and documentation can be auto-generated
from the configurations.
* Interoperability is achieved by firm governance rules - the configurations are stored and customizable as OPEN
governance reference-data. However, additional governance rules have to be provided to keep interoperability
guaranteed across deployments and to prevent unwanted interference of index extensions with actual schema properties.
[Back to TOC](#TOC)
---
# Solution
## Index Extension, Data Definition
OSDU Standard index extensions are defined by OSDU Data Definition work-streams with the intent to provide
user/application friendly, derived properties. The standard set, together with the OSDU schemas, form the
interoperability foundation. They can contribute to deliver domain specific APIs according to the Domain Driven Design
principles.
The configurations are encoded in OSDU reference-data records, one per each major schema version. The proposed type name
is IndexPropertyPathConfiguration. The diagram below shows the decomposition into parts.
![IndexPropertyPathConfiguration](/uploads/7f1330dd7a41903a90174feb7fe2c9d9/IndexPropertyPathConfiguration.png)
* One IndexPropertyPathConfiguration record corresponds to one schema kind's major version, i.e., the
IndexPropertyPathConfiguration record id for all the `schema osdu:wks:master-data--Wellbore:1.*.*` kinds is set
to `partition-id:reference-data--IndexPropertyPathConfiguration:osdu:wks:master-data--Wellbore:1`. Code, Name and
Descriptions are filled with meaningful data as usual for all reference-data types.
* The additional index properties are added with one JSON object each in the `Configurations[]` array. The Name defined
the name of the index 'column', or the name of the property one can search for. The Policy decides, in the current
usage, whether the resulting value is a single value or an array containing the aggregated, derived values.
* Each `Configurations[]` element has at least one element defined in `Paths[]`.
* The `ValueExtraction` object has one mandatory property, `ValuePath`. The other optional two properties hold value
match conditions, i.e., the property containing the value to be matched and the value to match.
* If no `RelatedObjectsSpec` is present, the value is derived from the object being indexed.
* If `RelatedObjectsSpec` is provided, the value extraction is carried out in related objects - depending on
the `RelationshipDirection` indirection parent/related object or children. The property holding the record id to
follow is specified in `RelatedObjectID`, so is the expected target kind. As in `ValueExtraction`, the selection can
be filtered by a match condition (`RelatedConditionProperty` and `RelatedConditionMatches`)
With this, the extension properties can be defined as if they were provided by a schema.
Most of the use cases deal with text (string) types. The definition of configurations is however not limited to string
types. As long as the property is known to the indexer, i.e., the source record schema is describing the types, the type
can be inferred by the indexer. This does not work for nested arrays of objects, which have not been indexed
with `"x-osdu-indexing": {"type":"nested"}`. In this case the types unknown to teh Indexer Service are
string-serialized; the resulting index type is then of type `string`, still supporting text search.
[Back to TOC](#TOC)
---
### Use Case 1, WellUWI
_As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am
able to specify a prioritized AliasNameType list to look up value in the NameAliases array._
The configuration demonstrates extractions from the record being indexed itself. With Policy `ExtractFirstMatch`, the
first value matching the condition `RelatedConditionProperty` is equal to one of `RelatedConditionMatches`.
<details><summary>Configuration for Well, extract WellUWI from NameAliases[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"ValueExtraction": {
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--AliasNameType:UniqueIdentifier:",
"{{data-partition-id}}:reference-data--AliasNameType:RegulatoryName:",
"{{data-partition-id}}:reference-data--AliasNameType:PreferredName:",
"{{data-partition-id}}:reference-data--AliasNameType:CommonName:"
],
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"ValuePath": "data.NameAliases[].AliasName"
}
}
],
"UseCase": "As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am able to specify a prioritized AliasNameType list to look up value in the NameAliases array."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 2, CountryNames
_As a user I want to find objects by a country name, with the understanding that an object may extend over country
boundaries._
This configuration demonstrates the extraction from related index objects - here `RelatedObjectKind`
being `osdu:wks:master-data--GeoPoliticalEntity:1.`, which are found via `RelatedObjectID` as
in `data.GeoContexts[].GeoPoliticalEntityID`. The condition is constrained to be that GeoTypeID is
GeoPoliticalEntityType:Country.
<details><summary>Configuration for Well, extract CountryNames from GeoContexts[]</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "CountryNames",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelatedConditionMatches": [
"{{data-partition-id}}:reference-data--GeoPoliticalEntityType:Country:"
],
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
],
"UseCase": "As a user I want to find objects by a country name, with the understanding that an object may extend over country boundaries."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 3, Wellbore Name on WellLog Children
_As a user I want to discover WellLog instances by the wellbore's name value._
A variant of this can be WellUWI from parent Wellbore → Well; in that case the value would be derived from the
already extended index values.
This configuration demonstrates extractions from multiple `Paths[]`.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellboreName",
"Policy": "ExtractFirstMatch",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.VirtualProperties.DefaultName"
}
},
{
"RelatedObjectsSpec": {
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedObjectID": "data.WellboreID"
},
"ValueExtraction": {
"ValuePath": "data.FacilityName"
}
}
],
"UseCase": "As a user I want to discover WellLog instances by the wellbore's name value."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
### Use Case 4, Wellbore index WellLogCurveMnemonics
_As a user I want to find Wellbores by well log mnemonics._
This configuration demonstrates the Policy `ExtractAllMatches` with related objects discovered by
RelationshipDirection `ParentToChildren`, i.e., related objects referring the indexed record.
<details><summary>Configuration for WellLog, extract WellboreName from parent WellboreID</summary>
```json
{
"data": {
"Configurations": [
{
"Name": "WellLogCurveMnemonics",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelationshipDirection": "ParentToChildren",
"RelatedObjectID": "WellboreID",
"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."
},
"ValueExtraction": {
"ValuePath": "Curves[].Mnemonic"
}
}
],
"UseCase": "As a user I want to find Wellbores by well log mnemonics."
}
]
}
}
```
</details>
[Back to TOC](#TOC)
---
## Index Extension, Governance
OSDU Data Definition ships reference value list content for all reference-data group-type entities. The type
IndexPropertyPathConfiguration is classified as OPEN governance, which usually means that new records can be added by
platform operators. This rule must be adjusted for IndexPropertyPathConfiguration records.
### Permitted Changes to IndexPropertyPathConfiguration Records
It is permitted to
* customize the conditions for value extractions, notable the matching values in `RelatedConditionMatches`.
* add additional `Paths[]` elements to `Configurations[].Paths[]`
* add new index property configuration objects to the `Configurations[]` array. To avoid interference with future OSDU
updates it is strongly recommended to add a namespace prefix to the Configurations[].Name, e.g., "OperatorX.WellUWI".
### Prohibited Changes to IndexPropertyPathConfiguration Records
It is not permitted to
* change the target value type of existing, OSDU shipped index extensions. Example the `ExtractionPath` to a string
property in the original OSDU `Configurations[].ValueExtraction.ValuePath` must not be altered to a number, integer,
or array.
* change the meaning of existing, OSDU shipped index extensions.
* remove OSDU shipped extension definitions in Configurations[].
[Back to TOC](#TOC)
---
## Consumption by Indexer Service
### Recursive Index Updates
With the introduction of de-normalizations record updates can cause infinite recursions. The implementation needs to
address this and avoid situations like in the following diagram:
![Recursions](/uploads/020675583cb7b65560f0d73ffe08fc3c/Recursions.png)
On the left hand Storage records are updated to new versions, which trigger indexing. The update of the index triggers
the index update of related index records due to the derived property values (as defined in the `RelatedObjectsSpec`).
These updates may, in turn, cause a recursion. This must not happen.
The augmenter introduces a new attribute `ancestry_kinds` in the Attributes map of the message payload when sending
messages to update the index of parent/children records. The value of `ancestry_kinds` attribute can include multiple
kinds separated by comma. This new attribute is used to prevent infinite loop of the index chasing. The indexer-queue
must pass the attribute back to the indexer when it receives indexing messages.
### Pseudo-Code
1. For each record to be indexed (create/update event from Storage service):
* Has the record kind a IndexPropertyPathConfiguration?
* Yes
* get or create the internal index schema that combines the schema of the record kind and schema of extended
properties
* create index document that combines the properties of original record and extended properties
* call ElasticSearch service to create or update the index of the record with extended properties
* No
* **_No action_** (=default for records without IndexPropertyPathConfiguration)
2. Re-Indexing (create/update event from Storage service for a IndexPropertyPathConfiguration record)<br>
To update the schema (or say template) of the kind in ElasticSearch when the kind is re-indexed:
* create the internal index schema derived from the kind (as registered in the Schema service)
* create the internal index schema derived from IndexPropertyPathConfiguration
* merge the internal index schemas
* convert the schema to ElasticSearch template
* call ElasticSearch service to update the index template (schema)
[Back to TOC](#TOC)
---
## Accepted Limitations
* A change in the configurations requires re-indexing of all the records of a major schema version kind. It is the same
limitation as an in-place schema change for any kind.
* All the extensions defined in the IndexPropertyPathConfiguration records refer to properties in the `data` block,
including `ValuePath`, `RelatedObjectID`, `RelatedConditionProperty`.
* Only properties in the `data` block of records being indexed can be reached by the `ValuePath`; system properties are
out of reach. The prefix `data.` is therefore optional and can be omitted.
* The formats/values of the extended properties are extracted from the formats/values of the related index records. If
the formats of the original properties are unknown in the related index records, the indexer will set the value type
of the extended properties as string or string array. (With additional complexity and schema parsing, this limitation
can be overcome, but currently the added value seems to be marginal.)
* If the extended properties are extracted from arrays of objects indexed with
(`"x-osdu-indexing": {"type":"flattened"}`), the indexer cannot re-construct the object properties to the
nested objects when the policy `ExtractAllMatches` is applied. (The kind of indexing is already a deliberate choice.
With additional complexity, this limitation can be overcome, but currently the added value seems to
be marginal.)
* To simplify the solution, all the related kinds defined in the configuration are kinds with major version only. They
must end with dot ".". For example: `"RelatedObjectKind": "osdu:wks:work-product-component--WellLog:1."`.
* Index updates may take time. Immediate consistency cannot be expected.
* When a kind derives extended properties from its parent(s), a new data property `data.AssociatedIdentities` is added
on demand by the indexer. The property name `AssociatedIdentities` is therefore reserved by the Indexer and shall not
be used in any OSDU schemas.
Currently, the property name `AssociatedIdentities` is not in use in any of the OSDU well-known schemas. Tests will be
implemented in the OSDU Data Definition pipeline to ensure that this reserved name does not appear as property in
the `data` block.
[Back to TOC](#TOC)
---
# Change Management
1. Configurations are reference-data and need to be ingested/updated.
2. OSDU Data Definitions must take on the task of defining IndexPropertyPathConfiguration records.
3. Updates (extensions) of index extensions must be managed carefully as they cause re-indexing the kinds involved.
# Decision
# Consequences
* The indexer code changes should have no impact on the system if no IndexPropertyPathConfiguration records are present.
[Back to TOC](#TOC)
---
# ADR Comments BelowM18 - Release 0.21https://community.opengroup.org/osdu/data/open-test-data/-/issues/90Add all released new schema versions to test data2023-05-22T21:07:39ZMichaelAdd all released new schema versions to test dataNew schemas are released (for instance master-data--Wellbore:1.3.0), however, these new schemas are not added to the pre-shipping environments for testing.
There should at least be a few records that use the latest released schemas for ...New schemas are released (for instance master-data--Wellbore:1.3.0), however, these new schemas are not added to the pre-shipping environments for testing.
There should at least be a few records that use the latest released schemas for each major data type (aster-data--Well, master-data--Wellbore, wpc--SeismicTraceData, wpc--SeismicHorizon, wpc--WellLog, etc) available in the pre-shipping environments by addition to the test dataset.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/261objectId filed is not present2023-04-26T13:48:46ZDmytro KomisarobjectId filed is not presentHere [README](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L40) says
```bash
az ad sp list --display-name ...Here [README](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L40) says
```bash
az ad sp list --display-name $NAME --query [].objectId -ojson
```
but output json does not have ".objectId" filed. Assume just ".id" is what is needed but it definitely need to be corrected.
Also, [line 48](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/central_resources/README.md?plain=1#L48) says:
```bash
az ad app permission admin-consent --id $appId
```
where $appId was not set. Again, I assume this should be "appId" from line 22, but not sure about this.
Could these please be fixed.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/99Include aws region in dataset information for AWS Seismic DDMS data2024-02-26T21:52:49ZMichaelInclude aws region in dataset information for AWS Seismic DDMS dataWhen using sdapi to retreive seismic ddms data coming from AWS, a user needs to first set the AWS_REGION environment variable (see ticket https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/s...When using sdapi to retreive seismic ddms data coming from AWS, a user needs to first set the AWS_REGION environment variable (see ticket https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-cpp-lib/-/issues/21).
To better handle this use case, the get dataset service `/dataset/tenant/{tenantid}/subproject/{subproject}/dataset/{datasetid}` should provide information regarding the aws region if the dataset is stored in s3 storage.https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/issues/1Fix Pytype check for the service2023-04-10T14:35:08ZYan Sushchynski (EPAM)Fix Pytype check for the serviceJob [#1863572](https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/jobs/1863572) failed for 041fa197aa9cf3276f4131894497fb8f53c7ffd1:Job [#1863572](https://community.opengroup.org/osdu/platform/deployment-and-operations/config-service/-/jobs/1863572) failed for 041fa197aa9cf3276f4131894497fb8f53c7ffd1:Yan Sushchynski (EPAM)Yan Sushchynski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/issues/24Show Openzgy library version number - as part of DAG name or some such suitab...2023-04-15T13:54:27ZDebasis ChatterjeeShow Openzgy library version number - as part of DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_Wallhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/issues/18Capture OpenVDS library version number in DAG name or some such suitable place2024-03-26T11:21:36ZDebasis ChatterjeeCapture OpenVDS library version number in DAG name or some such suitable placeConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallConsider exposing this information prominently, over and above showing in Airflow log.
cc @chad , @Keith_WallDeepa KumariDeepa Kumarihttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/eds-dms/-/issues/13Input validation on the API2023-04-06T20:25:50ZOkoun-Ola Fabien HouetoInput validation on the APIWe need a clear documentation of the approach for input validation. While there may not be a documentation or guideline at the forum level, EDS could document its approach to input validation. See https://community.opengroup.org/osdu/pla...We need a clear documentation of the approach for input validation. While there may not be a documentation or guideline at the forum level, EDS could document its approach to input validation. See https://community.opengroup.org/osdu/platform/system/storage/-/issues/51#note_39725 and https://community.opengroup.org/osdu/platform/security-and-compliance/home/-/issues/95#note_149265https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/98Pagination not supported by IBM and AWS for DATASET LIST (POST) endpoint2023-04-05T14:28:58ZPratiksha ShedgePagination not supported by IBM and AWS for DATASET LIST (POST) endpointA new API has been added as DATASET LIST (POST) endpoint which supports pagination. This API should return the list of datasets and nextPageCursor to get the next list of datasets. However, IBM and AWS do not support pagination for this ...A new API has been added as DATASET LIST (POST) endpoint which supports pagination. This API should return the list of datasets and nextPageCursor to get the next list of datasets. However, IBM and AWS do not support pagination for this endpoint, which causes the pagination tests to fail during pipeline runs.
Pipeline runs:
IBM-https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/1823012
AWS-https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/jobs/1842803https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/issues/152APIs to get the XCOM summary (Entries) are working in AWS environment, but ar...2023-11-09T07:43:12ZKamlesh TodaiAPIs to get the XCOM summary (Entries) are working in AWS environment, but are NOT working in other CSPs (Azure, GC and IBM) environmentsThe APIs to get the xcomEntries using the the runid and the taskinstance are working in AWS environment. The endpoints/API are not implemented/deployed in other CSP's (Azure, GC, IBM) environments.
<details><summary>curl --location 'htt...The APIs to get the xcomEntries using the the runid and the taskinstance are working in AWS environment. The endpoints/API are not implemented/deployed in other CSP's (Azure, GC, IBM) environments.
<details><summary>curl --location 'https://r3m16.forumtesting.osdu.aws/api/airflow/api/v1/dags/Osdu_ingest/dagRuns/45eb9f45-aada-4e2c-b618-818fb5dfcf28/taskInstances/process_single_manifest_file_task/**xcomEntries/record_ids**' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer eyJraWQiOi...fWbOUA3RcQ'</summary>
</details>
Response 200 OK
{
"dag_id": "Osdu_ingest",
"execution_date": "2023-04-04T21:19:27.327451+00:00",
"key": "record_ids",
"task_id": "process_single_manifest_file_task",
"timestamp": "2023-04-04T21:19:48.761929+00:00",
"value": "['osdu:reference-data--FacilityType:WELL_999259423605', 'osdu:master-data--Organisation:Auto_Test_999259423605', 'osdu:reference-data--FacilityEventType:SPUD_DATE_999259423605', 'osdu:reference-data--VerticalMeasurementPath:DEPTH_DATUM_ELEV_999259423605', 'osdu:reference-data--AliasNameType:WELL_NAME_999259423605', 'osdu:master-data--Well:999259423605']"
}
=========================================================================================================================
`curl --location 'https://r3m16.forumtesting.osdu.aws/api/airflow/api/v1/dags/Osdu_ingest/dagRuns/45eb9f45-aada-4e2c-b618-818fb5dfcf28/taskInstances/process_single_manifest_file_task/**xcomEntries/skipped_ids**' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer eyJraWQiOi...fWbOUA3RcQ`
Response 200 OK
{
"dag_id": "Osdu_ingest",
"execution_date": "2023-04-04T21:19:27.327451+00:00",
"key": "skipped_ids",
"task_id": "process_single_manifest_file_task",
"timestamp": "2023-04-04T21:19:48.783236+00:00",
"value": "[]"
}
@chad @debasisc @Srinivasan_Narayanan @dzmitry_malkevich @anujguptahttps://community.opengroup.org/osdu/platform/system/storage/-/issues/171Metadata only updates (via PATCH api) creates a mismatch in modifyUser and mo...2023-07-05T09:50:37ZAlok JoshiMetadata only updates (via PATCH api) creates a mismatch in modifyUser and modifyTime fields between record metadata and record data[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUse...[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUser fields for metadata and data objects respectively.
Repro steps:
- Create a storage record
- Modify the metadata ACL with PATCH api
- Retrieve the record with Storage records:batch api or getRecord api
- modifyTime and modifyUser fields are not returned.
OR
- Create a storage record
- Update the same record with PUT api
- Modify the metadata ACL with PATCH api
- Retrieve the record
- modifyTime and modifyUser are returned but not correct
Expected: From a user's perspective, when they update a record (either metadata or data or both), they should get back modifyUser and modifyTime values appropriatelyhttps://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-sdutil/-/issues/26[Azure R3M16] sdutil : Segy file appears to be present in SD-STORE, but the f...2023-04-04T11:19:24Zkenneth liew[Azure R3M16] sdutil : Segy file appears to be present in SD-STORE, but the file transmission was unsuccessful. I had failed to transfer my SEGY file to sdutil storage since I put the wrong local file path, but the file was created on sdutil storage.
I had run the commands "sdutil stat" and "sdutil cp" for your reference.
Below is my Python comm...I had failed to transfer my SEGY file to sdutil storage since I put the wrong local file path, but the file was created on sdutil storage.
I had run the commands "sdutil stat" and "sdutil cp" for your reference.
Below is my Python command for your reference.
```
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil auth login
Successfully logged into Azure SDUTIL.
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil cp C:\Users\kuanl\Desktop\SegY\SampleSegy\UP000000001__UP123456__TST-SEGY-UPLOAD-TST__1000022.sgy sd://opendes/kennethv3/TestFailed2.sgy
Wrong Command: C:\Users\kuanl\Desktop\SegY\SampleSegy\UP000000001__UP123456__TST-SEGY-UPLOAD-TST__1000022.sgy is not a valid local file name or the local file does not exist.
For more information type "python sdutil cp" to open the command help menu.
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil ls sd://opendes/kennethv3
SegyTest.segy
Seismic_data.segy
TestFailed.sgy
TestFailed1.sgy
TestFailed2.sgy
UP000000001__UP123456__TST-SEGY-UPLOAD-TST__100001.sgy
UP000000001__UP123456__TST-SEGY-UPLOAD-TST__100001.sgy.sgy
UP000000002__UP123456__TST-SEGY-UPLOAD-TST__100002.sgy
UP000000002__UP123456__TST-SEGY-UPLOAD-TST__10001.sgy
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil stat sd://opendes/kennethv3/TestFailed2.sgy
- Name: sd://opendes/kennethv3/TestFailed2.sgy
- Created By: 97pQgJtRFH99Y1KViwFV4GaADxKsIeRG9ZPJ-4PnMb0
- Created Date: Tue Apr 04 2023 09:07:48 GMT+0000 (Coordinated Universal Time)
- ReadOnly: False`
(sdutilenv) C:\Sdutil\AZURE_R3M16>python sdutil cp sd://opendes/kennethv3/TestFailed2.sgy C:\Users\kuanl\Desktop\SegY\SampleSegy\TestFailed2.sgy
[423] [seismic-store-service] opendes/kennethv3/TestFailed2.sgy is locked for write [RCODE:WL86400]
```https://community.opengroup.org/osdu/platform/consumption/geospatial/-/issues/236Deployment - Refactor and separate the code base into two: Transformer and Pr...2023-05-31T15:25:27ZJoel RomeroDeployment - Refactor and separate the code base into two: Transformer and Provider**Goal:** Meet structural prerequisites so OSDU's standardized pipelines for scanning, building, deploying have improved compatibility with GCZ.
**Problem:** GCZ repo currently houses two applications in [Transformer](https://community....**Goal:** Meet structural prerequisites so OSDU's standardized pipelines for scanning, building, deploying have improved compatibility with GCZ.
**Problem:** GCZ repo currently houses two applications in [Transformer](https://community.opengroup.org/osdu/platform/consumption/geospatial/-/tree/master/gcz-transformer) (Java) and [Provider](https://community.opengroup.org/osdu/platform/consumption/geospatial/-/tree/master/gcz-provider) (JavaScript), but OSDU Pipelines anticipate a repository to host a singular application at the root level. If you attempt to run the Maven build script from OSDU on our repository, it will be unsuccessful because our Java application is nested below the root.
**Solution:** Break repository in two
- Geospatial (the current repository): Remains host to the core GCZ docs, test assets, and gcz-transformer code. This repository would also remain the hub for any issues or discussions related to the GCZ effort.
- Provider: Becomes new host for gcz-provider. Includes documentation specifically pertaining to Provider, but also links relevant information back to Geospatial repository.
This will not affect the code itself, or any functionality of the GCZ services.
_Note: Provider is wholly dependent on the Transformer, so these services should always be expected to run in tandem._
**Benefits:**
- Separate pipelines/branches per service
- Improved compatibility with standard-setup, scan, build pipelines from OSDU
- More familiar structure with clearer separation of concerns
**Disadvantages:**
- Issues that span both repositories will require twice the maintenance to close
- Increased onboarding complexity with two codebases
@chad @debasisc @divido @joelvromero - I will request approval from each CSP below, but in the meantime please reply below if any concerns come to mind that are not addressed above.Levi RemingtonAnkita SrivastavaLevi Remingtonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/22EDS - Adding Dynamic Schema Authority for Kind of CSRE ,CSDJ and ExternalRefe...2023-05-04T10:42:00ZPriyanka BhongadeEDS - Adding Dynamic Schema Authority for Kind of CSRE ,CSDJ and ExternalReferenceValueMapping from Airflow VariableM17 - Release 0.20Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/21EDS - Raise exception when Airflow Variable not found or None2023-05-04T10:42:00ZPriyanka BhongadeEDS - Raise exception when Airflow Variable not found or NoneTo raise exception when Airflow Variable are not found or None value for the eds ingest and eds SchedulerTo raise exception when Airflow Variable are not found or None value for the eds ingest and eds SchedulerM17 - Release 0.20Priyanka BhongadePriyanka Bhongadehttps://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/issues/18AWS : allow 'osdu_api.ini' file path configuration2024-01-17T12:08:05ZValentin GauthierAWS : allow 'osdu_api.ini' file path configurationFor now, it seems that it is not possible to modify the default file path of the *osdu_api.ini* file when using AWS.
As it is required for the AWS configuration, I suggest a modification of this line : https://community.opengroup.org/osd...For now, it seems that it is not possible to modify the default file path of the *osdu_api.ini* file when using AWS.
As it is required for the AWS configuration, I suggest a modification of this line : https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/blob/master/osdu_api/providers/aws/service_principal_util.py#L63
The modification could be :
```python
config_file_name = os.environ.get("OSDU_API_CONFIG_INI") or "osdu_api.ini"
```
Thus, it will use the "OSDU_API_CONFIG_INI" environment variable as in *DefaultConfigManager* class (see. https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk/-/blob/master/osdu_api/configuration/config_manager.py#L73)
This modification should not break any actual configuration, except if a configuration use 2 differents *osdu_api.ini* files, one used for *DefaultConfiguration* class and one for the AWS configuration (which seems not to be a good practice I think ?).https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-sdutil/-/issues/25sdutil cp - to show checksum comparison after completion of copying the file2023-05-18T09:40:40ZDebasis Chatterjeesdutil cp - to show checksum comparison after completion of copying the filePlease consider adding this feature to ensure integrity of data in Seismic Store.
Show checksum of source data file, and the same from the copied file.
Even add the same feature in "sdutil stat". "stat" may also report in bytes units o...Please consider adding this feature to ensure integrity of data in Seismic Store.
Show checksum of source data file, and the same from the copied file.
Even add the same feature in "sdutil stat". "stat" may also report in bytes units of measure.
R3M16/Azure/Preship sdutil -
"**cp**" command (copying the file)
sdutil copy file
```
(sdutilenv) C:\seismic-store-sdutil-master>python sdutil cp C:\TEMP\osdu-volve.segy sd://opendes/debasis/volve.segy
Uploading [========================================] **1104999800/1104999800** [100%] in 12:36.1 (1461432.29/s)
Transfer completed
(sdutilenv) C:\seismic-store-sdutil-master>
```
Source data in local disk
(sdutilenv) C:\seismic-store-sdutil-master>dir C:\TEMP\osdu-volve.segy
Volume in drive C is OS
Volume Serial Number is 62E2-67ED
Directory of C:\TEMP
04/24/2021 04:39 AM **1,104,999,800** osdu-volve.segy
1 File(s) 1,104,999,800 bytes
0 Dir(s) 25,783,111,680 bytes free
(sdutilenv) C:\seismic-store-sdutil-master>
"**stat**" command
```
(sdutilenv) C:\seismic-store-sdutil-master>python sdutil ls sd://opendes/debasis
volve.segy
(sdutilenv) C:\seismic-store-sdutil-master>python sdutil stat sd://opendes/debasis/volve.segy
- Name: sd://opendes/debasis/volve.segy
- Created By: 97pQgJtRFH99Y1KViwFV4GaADxKsIeRG9ZPJ-4PnMb0
- Created Date: Wed Mar 29 2023 00:00:57 GMT+0000 (Coordinated Universal Time)
- **Size: 1.0 GB**
- ReadOnly: False
(sdutilenv) C:\seismic-store-sdutil-master>
```M18 - Release 0.21Debasis ChatterjeeDebasis Chatterjeehttps://community.opengroup.org/osdu/platform/consumption/geospatial/-/issues/235GCZ doesn't have an Open API Spec2023-05-09T17:14:30ZMorris EstepaGCZ doesn't have an Open API SpecGCZ doesn't have an Open API Spec documenting all available APIs. See this for reference: https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview#consumption-zoneGCZ doesn't have an Open API Spec documenting all available APIs. See this for reference: https://community.opengroup.org/osdu/documentation/-/wikis/Core-Services-Overview#consumption-zoneM18 - Release 0.21https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/issues/20Remove StrEnum from the code2023-06-07T16:37:09ZYan Sushchynski (EPAM)Remove StrEnum from the codeHello,
I think it is possible to delete `StrEnum` from the dependencies, and replace them with something like:
```
class YourStrEnum(str, Enum):
pass
```
The enum from the above behaves the same as StrEnum, so it spares us installi...Hello,
I think it is possible to delete `StrEnum` from the dependencies, and replace them with something like:
```
class YourStrEnum(str, Enum):
pass
```
The enum from the above behaves the same as StrEnum, so it spares us installing extra dependency
More details here:
https://docs.python.org/3.8/library/enum.html#othersM18 - Release 0.21Ashish SaxenaNisha ThakranJeyakumar DevarajuluPriyanka BhongadeAshish Saxena