[LADR] Schema Service Adoption
We are often left to address the gaps from architectural principles (which stay at a pretty high and abstract level) to the actual implementation detail. Here is an attempt to bridge that gap by providing a set of Lightweight Architecture Decision Records (LADRs) which are simple to follow and can be implemented in a given team/project by the developers
Decision Title
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
The current OSDU service suite has 2 services that provides capabilities to define a schema in the data platform. A comparative analysis of capabilities of these services is as follows
Schema defined with Schema Service | Schema defined with Storage Schema Service |
---|---|
Rich JSON Objects as per JSON Schema draft#07 | Flattened JSON having an array of attributes |
Being JSON object allows defining real entity model | Schema definition does not support this |
Schema fragments can be referred within the definition | Schema definition does not support this |
Intended for central governance for all schema defined in the system | Intended primarily for discovery and searching of data |
Semantic versioning support and validation | No validation |
This ADR intends to highlight and encourage the adoption of schema service sooner than later. Further sections explain what it means from storage schema perspective.
Decision
Currently, in spite of richness that Schema service offers in terms of schema definition and schema management there is always a need to define the storage schema to complete the workflows end to end, including discovery of data. This means the data platform always needs the flattened version of the rich domain model defined with Schema Service. This leads to potential overhead for management of schema.
The recommendation is to adopt Schema Service for all end to end workflows. Earlier adoption would lead help plan deprecation of storage schema service and would lead to lesser conflicts and issues for migrating existing storage schemas to schema service.
Rationale
Following are some of the main influencing factors for the ADR recommendation
- Schema service has been adopted for most of the cloud platforms (GCP, AWS, Azure, IBM and Oracle).
- One of the main consumer of storage schema, Indexer service has also adopted schema service.
- Schema service supports richer definitions and capabilities to manages schema.
- The sooner it is adopted the lesser would be complexities in deprecating Storage schema service and migrating storage schemas.
Consequences
-
What is expected from workflow that already create storage schema using Storage schema service?
Move to schema service to create the schema.
-
What happens to OSDU R3 storage schemas loaded in deployments?
OSDU R3 schemas will be bootstrapped with Schema service and hence readily available in the deployments.
-
What happens to the storage schemas that would have been created using storage schema end points?
These will have to be transformed and migrated to schema service. Scripts can be developed for same.
-
Who will be expected to executed the storage schema migration script?
Recommendation would be to have it as an operational process. This will help identify, conflict resolution, if any, that might be needed during migration process.
-
What is the impact of this discovery of data for which the indexing has already happened.
Schema service exposes a superset of options compared with the storage schema service. Only the subset of features that storage schema supported have been implemented in Indexer as of today. So only that subset piece of schema definitions is really discoverable today.
-
Will a re-indexing be needed?
Assuming the schema model has not changed and it is the subset of schema definition that Indexer supports today, re-indexing will not be needed. Whenever the limitation in indexer is fixed re-indexing might be needed depending on the discovery needs.
-
What happens to the Storage schema end points?
So if all above bullets points seems acceptable, a viable approach could be Come up with deprecation strategy and timeline for storage schema end points. All workflow consuming storage schema service move to schema service. Create migration script to migrate storage schema to Schema service.
Tradeoff Analysis - Input to decision
To agree upon
- Moving all workflows to schema service instead of storage schema service.
- Migration utility
- Storage schema end point deprecation strategy to be worked upon.