[Validation] Ingestion schema validation
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
History
This was approved in January OSDU R2 Decisions review and has since been revisited.
We can do data validation (data quality checks – not schema compliance checks) in several different ways (Joe):
- Validate the data in the Manifest file BEFORE we load the manifest data
- Validate the data as it is parsed from a data file (not in scope for R2)
- Validate records as an enrichment (not if scope for R2, as this requires and event controller and a registry)
To add data validation during ingestion we need to first load the reference data and the system (in the runtime) needs to retrieve the reference data and validate the Manifest.
The schema service (os-schema has basic JSON validation in the current implementation; full validation is planned for later.
Alan has requested validating the data(Manifest) to make sure that manifest does not have the junk values.
This is not in the scope for R2.
- We need to have a basic ingestion workflow in place before we can add any validation, so this can be post R2 and we need to communicate to everyone.
Context & Scope
- OpenDES has e a schema service which can be called for validation as a black box
- The OSDU manifest may include multiple schemas
- The compatibility layer does schema but not reference data validation, eg. does data referenced by SRN exist
Decision
Rationale
Consequences
Implementation Task:
- Create an ADO story
- Dependency on Schema Service developed by SLB Pune Team
- Ethiraj to update on readiness
- Cross-Cloud Testing and Validation
When to revisit
Tradeoff Analysis - Input to decision
Alternatives and implications
Propose postponing post-R2; this implies that bad data can be loaded.