Documentation issueshttps://community.opengroup.org/osdu/documentation/-/issues2020-05-19T12:49:28Zhttps://community.opengroup.org/osdu/documentation/-/issues/49Search indexing2020-05-19T12:49:28ZFerris ArgyleSearch indexing## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Master data wasn’t indexed in initial denormalization, and therefore couldn’t be found.
## Decision
Index all elements and att...## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Master data wasn’t indexed in initial denormalization, and therefore couldn’t be found.
## Decision
Index all elements and attributes of ingested metadata with schema
## Rationale
## Consequences
Implementation Tasks:
* Gap fit on this use case
* Test according to the definition of done (Write test cases)
* Add a user story to the project ADO
## When to revisit
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
## Decision criteria and tradeoffs
## Decision timelineRelease 2ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/documentation/-/issues/53Ingestion schema validation2020-05-19T12:48:03ZFerris ArgyleIngestion schema validation## Status
- [ ] Proposed
- [ ] Trialing
- [X] Under review
- [ ] Approved
- [ ] Retired
### History
This was approved in [January OSDU R2 Decisions review](https://docs.google.com/presentation/d/13muxPb8gnZ8P_mwFvGu0K3rWaqoHD5z5uNx2-Lb...## Status
- [ ] Proposed
- [ ] Trialing
- [X] Under review
- [ ] Approved
- [ ] Retired
### History
This was approved in [January OSDU R2 Decisions review](https://docs.google.com/presentation/d/13muxPb8gnZ8P_mwFvGu0K3rWaqoHD5z5uNx2-LbmN2k/edit#slide=id.g7628e71997_0_358) and has since been revisited.
We can do data validation (data quality checks – not schema compliance checks) in several different ways (Joe):
* Validate the data in the Manifest file BEFORE we load the manifest data
* Validate the data as it is parsed from a data file (not in scope for R2)
* Validate records as an enrichment (not if scope for R2, as this requires and event controller and a registry)
To add data validation during ingestion we need to first load the reference data and the system (in the runtime) needs to retrieve the reference data and validate the Manifest.
The schema service ([os-schema](https://dev.azure.com/slb-des-ext-collaboration/open-data-ecosystem/_git/os-schema) has basic JSON validation in the current implementation; full validation is planned for later.
Alan has requested validating the data(Manifest) to make sure that manifest does not have the junk values.
This is not in the scope for R2.
* We need to have a basic ingestion workflow in place before we can add any validation, so this can be post R2 and we need to communicate to everyone.
## Context & Scope
* OpenDES has e a schema service which can be called for validation as a black box
* The OSDU manifest may include multiple schemas
* The compatibility layer does schema but not reference data validation, eg. does data referenced by SRN exist
## Decision
## Rationale
## Consequences
Implementation Task:
* Create an ADO story
* Dependency on Schema Service developed by SLB Pune Team
* Ethiraj to update on readiness
* Cross-Cloud Testing and Validation
## When to revisit
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
Propose postponing post-R2; this implies that bad data can be loaded.
## Decision criteria and tradeoffs
## Decision timelineRelease 3JoeJoehttps://community.opengroup.org/osdu/documentation/-/issues/39Enriching OSDU Objects to simplify search2020-04-23T04:23:53ZStephen Whitley (Invited Expert)Enriching OSDU Objects to simplify search# Enriching the OSDU R1 Metadata structure and data to simplify search for R2
## Status
- [X] Proposed
- [X] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
R1 relied heavily on denormalization of the index...# Enriching the OSDU R1 Metadata structure and data to simplify search for R2
## Status
- [X] Proposed
- [X] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
R1 relied heavily on denormalization of the index to improve the usability of search. Some of the denormalization (flattening hierarchical attributes) creates integrity risk in the index; while others simply removed ambiguity. However, even in these cases, the index no longer truly mirrored the metadata structure defined by the data definitions group.
We are looking for an approach that achieves the usability goal; while ensuring the the index and metadata structure remain aligned.
## Decision
***Schema version is updated**, @dmitry-kniazev will update the version based on the scripts.
For R2 will will take advantage of schema and metadata versioning as well as enrichment.
- We will load the original OSDU schema (a version 0.2.0) and metadata with minimal manipulation during ingestion.
- We will *enrich* the schema ( a version 0.2.1) and metadata by adding computed properties to improve the usability of search.
- We will evaluate the resulting search semantics for both approaches.
- Finally, we will revisit this decision with the EA Architecture and data definitions subcommittees once R2 is completed to evaluate the trade-offs of this implementation as input to how we approach this in the long term.
```mermaid
graph LR
style Storage fill:#0F0,stroke:#333,stroke-width:4px
subgraph Ingestion
S1[/OSDU Schema 0.2.1/] --Store--> Storage
S0 --Prepare-->S1
S0[/OSDU Schema 0.2.0/] --Prepare & Store--> Storage[(Storage)]
D0[/OSDU Metadata 0.2.0/] --Prepare & Store--> Storage
end
subgraph Enrichment
Storage --Extract---Enrich[Enrich to 0.2.1]
Enrich --Produce-->D1[/OSDU Metadata 0.2.1/]
D1--Store-->Storage
end
```
## Rationale
The notion of versioning and enrichment are core architectural capabilities. By performing this "enrichment" inside the data platform; we maintain integrity and trace-ability.
## Consequences
For every OSDU record coming in, we will have two versions (original and enriched). This is normal practice
## When to revisit
April 1st, 2020
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
- Use the original schema and metadata without enrichment: Has usability issues that heavily rely on training and establishing documented conventions on interpreting
- Perform all the enrichment outside of the data platform: All trace-ability to the original data and transforms is lost
## Decision criteria and trade-offs
- Usability
- Integrity
- Maintainability
- Explain-ability
## Decision timeline
Initial decision Feb 14,2020, Revisit decision April 1, 2020
Release 2ethiraj krishnamanaiduStephen Whitley (Invited Expert)Ferris ArgyleDmitry KniazevDania Kodeih (Microsoft)Joeethiraj krishnamanaidu2020-04-01https://community.opengroup.org/osdu/documentation/-/issues/47Search - Return entities name2020-03-23T18:39:10ZFerris ArgyleSearch - Return entities name## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
## Decision
Retain name for all return entities: these are there in R1, but weren’t in R0 in some cases
## Rationale
Support...## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
## Decision
Retain name for all return entities: these are there in R1, but weren’t in R0 in some cases
## Rationale
Supports ISVs currently implemented functionality.
## Consequences
None.
Implementation Tasks:
* Test according to the definition of done (Write test cases)
* Add a user story to the project ADO
## When to revisit
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
## Decision criteria and tradeoffs
## Decision timelineRelease 2ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/documentation/-/issues/48OpenDES record/resource id and group semantics (SRN)2020-03-23T18:36:07ZFerris ArgyleOpenDES record/resource id and group semantics (SRN)## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
SRN has record/resource id and group encapsulated, whereas GUID doesn’t - Alan
## Decision
Add semantics to OpenDES for record...## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
SRN has record/resource id and group encapsulated, whereas GUID doesn’t - Alan
## Decision
Add semantics to OpenDES for record/resource id and group
## Rationale
## Consequences
Implementation Tasks:
* Review and formalize the gap fit
* Add a service to map the SRN to the identifier to the ingestion workflow in OpenDES
* Add test
* Define new services to fill gap
## When to revisit
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
## Decision criteria and tradeoffs
## Decision timelineRelease 2ethiraj krishnamanaiduethiraj krishnamanaiduhttps://community.opengroup.org/osdu/documentation/-/issues/54Ingestion reference data validation2020-03-23T18:35:20ZFerris ArgyleIngestion reference data validation## Status
- [ ] Proposed
- [ ] Trialing
- [x] Under review
- [ ] Approved
- [ ] Retired
### History
This was approved in [January OSDU R2 Decisions review](https://docs.google.com/presentation/d/13muxPb8gnZ8P_mwFvGu0K3rWaqoHD5z5uNx2-Lb...## Status
- [ ] Proposed
- [ ] Trialing
- [x] Under review
- [ ] Approved
- [ ] Retired
### History
This was approved in [January OSDU R2 Decisions review](https://docs.google.com/presentation/d/13muxPb8gnZ8P_mwFvGu0K3rWaqoHD5z5uNx2-LbmN2k/edit#slide=id.g7628e71997_0_358) and has since been revisited.
## Context & Scope
* Currently we are only validating manifests, not reference targets; we should validate that reference data to which an SRN refers exists, as well as other relationships
* How should this work? OpenDES model is to load and cleanup after based on batch scan checks; OSDU philosophy tends to be to only allow valid data in; if they’re doing as part of pre-processing, then capture as long-term backlog issue.
## Decision
## Rationale
## Consequences
Implementation Tasks:
* Gap Fit: Understand how validation is done today (ex, is it well formed JSON)
* Create an ADO story
* Dependency on Schema Service developed by SLB Pune Team
* Ethiraj to update on readiness
* Cross-Cloud Testing and Validation
## When to revisit
---
# Tradeoff Analysis - Input to decision
## Alternatives and implications
Propose postponing post-R2; this implies that bad data can be loaded.
## Decision criteria and tradeoffs
## Decision timelineRelease 2JoeJoe