[Multi-Region Deployment] Support OSDU Multi-Region Deployment : Requirements
Background: Subsurface exploration and production is a global enterprise, hence most operating companies have a need for subsurface data availability across multiple geographic regions. Strict latency, data sovereignty, or local regulatory/legislative requirements may result in the need for numerous distributed, high-availability OSDU regions. Sharing and managing data across multiple distributed installations is a core capability for OSDU.
An OSDU Region is defined as an independent and highly available geographical deployment of the full OSDU design. It provides full functionality within a given region and there is no need for deploying multiple OSDU environments for high availability or disaster recovery requirements. Strict latency, data sovereignty or local legislation requirements may result into the need for additional OSDU deployments. OSDU deployments are preferably public cloud based. In situation where in-country OSDU deployment is required and no public cloud region is available, physical edge capabilities from the public cloud provider will be used. In any case, the exposed OSDU APIs deployed in the different OSDU regions will provide identical functionality.
Objective:
- This requirement is to ensure OSDU data platform can support highly available multi-region/geographical deployment.
Business Case:
- Adopting company has subsurface professionals interacting with OSDU around the world. All expect optimal performance and high availability when interacting with data and applications.
- Adopting company has stringent availability/up-time and fault tolerance demands on the data and applications hosted in OSDU.
- Adopting company have disaster recovery and business continuity requirements that need to be fulfilled. A multi-region deployment is essential to the goal of maximum redundancy and resilience to catastrophic failure.
- Regulatory restrictions governing the adopting company's activities mandate regional data never travels beyond the region's geographic borders.
Summary of Requirements OSDU must enable OSDU Using Companies to stand-up and operate multiple concurrent, homogeneous instances of OSDU in different geographic regions. Each OSDU deployment instance is stand-alone/independent, but all are federated to provide global access to the underlying subsurface data hosted and delivered via the OSDU Data Platform in each region. OSDU shall deliver users no more than approx. 40 millisecond latency between the end user and the OSDU co-resident applications and services; therefore, multiple regional deployments shall have globally replicated metadata and regionally originated and stored actual data files (datasets, etc.), including raw data, processed data and interpreted data.
Requirements General Requirements
- Using companies shall be able to deploy any number of independent, complete, and homogeneous instances of OSDU in different geographic regions.
- Each regional deployment includes identical functionalities in the main architectural components: platform services, loading/enrichment/consumption data services, search engine, graph and NoSQL databases, and object stores.
- Each deployment is 'stand-alone': applications, services and data can be independently deployed, accessed, and utilized by real-world users within the hosting region.
- Each regional deployment can access the shared 'System of Record' i.e., the data platform including all the globally replicated metadata and reference data.
- Stand up of a new deployment may require administrative capabilities residing in another regional deployment
Administration Requirements
- One region shall be designated as the 'Admin Region' which provides administrative functions allowing configuration, monitoring, and control of all regional instances.
- OSDU shall provide the ability to designate and configure one and only one deployment as an 'Admin Region'.
- Once an OSDU deployment is designated as the 'Admin Region', all administrative API calls throughout all OSDU regional deployments are re-rerouted to the administrative services hosted in the Admin Region
- From the ‘Admin Region’, the health and availability of other OSDU regional deployments can be accessed and logged.
- Administrative entitlements shall be defined to govern which users/roles can access which administrative capabilities.
Infrastructure Requirements
- One region (not necessarily the 'Admin Region') can be designated a 'Central Region' in order to host the development and deployment infrastructure (dev/test environments, CI/CD tooling, etc.)
- Using companies shall be able to install and configure development + deployment infrastructure to a regional instance for development, acceptance test, and CI/CD activities.
- Deployment infrastructure accommodates setup of the OSDU deployment instances (platform services, identity provider, data platform, search capabilities, etc.) It also must accommodate deployment of applications and services from the OSDU marketplace as well as Using Company proprietary works. All regional deployments are homogeneous and the deployment infrastructure accommodates making new regional deployments appear similar to pre-existing deployments.
- Using companies shall be able to setup and configure connectivity between an OSDU instance and Operator's on-premise data center(s)
- Each OSDU deployment must allow connections to/from a co-hosted 'Transit VPC' which facilitates data connections between on-premise / in-country installations and OSDU
Data Platform Requirements
- Replication of data and services are always subject to administratively-defined replication policies.
- Replication policies are persisted and distributed as master reference data in the OSDU Data Platform.
- Global access to metadata and underlying data records is governed by the entitlements’ policies / rules from the OSDU Data Platform.
- All metadata is replicated across all nodes; at every node access to all metadata for that multi- region OSDU implementation. Single-cloud provider within an OSDU Using Company (i.e., replicating metadata between multiple cloud provider solutions is out of scope for R3).
- Transactional data is replicated by exception, not by default. It is expected that the 'home' for data is well-defined and mapped between its geographical region and its usage. By exception and on-request, transactional data can be replicated to host regions to achieve efficient consumption.
- Data is to be regionally bound and tagged. Data resides by default in the OSDU “home” region in which the loading/ingestion workflow occurred, which will be the region associated with the data from an earth location perspective. Since the describing metadata is replicated globally, any OSDU user can discern in which region the described data resides.
- The OSDU DP Search API can be used to locate search results based on globally replicated master-data, reference-data, and metadata. As stated elsewhere in the requirements, metadata are not protected and a user accessing any OSDU regional deployment can see all metadata. However, this is only true for metadata and of course not for the data content files / datasets / etc. (raw, processed, interpreted, etc.).
- Facilities shall be provided to enable on-demand replication of content data files from ‘home’ region to a ‘host’ region’: selected data can be transported via request from its home region to any other OSDU region. Subsequent changes, that is, new items of data that serve as successor versions or new derivations, are populated to the home region and, if requested, to the same ‘host region’.
Standout Non-functional Requirements (TBD)
- High-availability: Are OSDU services and data always available to end users, despite failure conditions?
- Performance: Does the distributed solution meet the in-region performance goals for the data platform?
- Latency: How much latency is there between the creation of content and it's availability across all regions?
- Security: How much does the solution increase the threat surface of the OSDU data platform?
- Ease of Use: How easy is it configure and operate?
- Observability: How easy is it to test, identify, and resolve incorrect behavior?
- Maintainability: How much does the solution introduce complexity in development, deployment and operations of the data platform?
Additional Use Cases
- Automated replication of metadata across all regions.
(applies to: work-product-metadata, work-product-component-metadata, associated file metadata)
- "Easy" - leverage CSP storage technologies
- Leads to 'eventual consistency' of search result indices which crawl the metadata at different intervals
- Master-Reference Data - also needs to be automatically replicated across all regions (to enable context and validation for transactional data.)
- On-demand request for a WP located in a different region:
- Can be requested by a user via search UI or by a microservice via API
- WP is copied to the requestor region
- Subject to E&O contract + replication policy. If the replication policy states the data cannot travel to the requestor region, then user is notified - confirm?
- Metadata has a 'locator' indicating the WP's home region
- Lineage preserved, but it's not a deep copy of the references
- WPCs are 'deep copied' to the requestor region with read-only access
- Any derivative WPs made in the requestor region are loaded/ingested back in the 'home' region as identified in the metadata 'locator'
- Lineage is updated as usual
- WPs are loaded/ingested in the 'home' region
- Repartitioning (example: grow from 2-regions to 3-regions) - be able to reallocate home partition boundaries via 'staging'
- Creating a new region -- provides instant accessibility to the 'common' partition.
- Administrative functions to be able to move/copy a partition between regions. Some using companies may require the ability to replicate a single partition across several regions. Each region has a replica with only the data applicable to its users / assets (geographic area) and can transfer changes of this data to the 'home' region. This allows the 'home' region to perform analysis on data that is up to date across the entire extent.
Architecture Principles / Guidelines
- Synchronous cross-regional calls should be avoided when possible. Applications should use regional resources.
- Embrace asynchronous systems and replication - high availability / deferred consistency
- Leverage the principles of Isolation and Redundancy: a failure of any kind in one Region should not affect services running in another.
OSDU Regions An OSDU Region is an independent and complete deployment of all infrastructure and application services required to provide OSDU Data Platform (DP) services. Whilst every OSDU Region will have a homogeneous DP service deployment, a few types of OSDU regions can be identified:
- “OSDU Standard Hosted Region”: This is the common type of OSDU Region and will host all data platform services, including security, ingest, logging and monitoring. Platform, application and workflow services will be deployed based on need in this standard OSDU Region.
- “OSDU Central Region”: One of the standard OSDU regions will be assigned as a central region and will be hosting the development, integration test and performance test environments and Continuous Integration (CI)/ Continuous Delivery (CD) services. If required, it may host a central logging service. It will also host global OSDU application.
- “OSDU Admin Region”: One of the standard OSDU regions will be assigned as an admin region and this is a configured parameter in the OSDU metadata, namely the 'Admin Management' parameters. Subsurface Master Data System (SMDS) Master and Reference Data as well as OSDU management parameters stored in SMDS can only be updated in this configured OSDU Admin Region. All Admin APIs are available in all OSDU regions, though the call will be rejected (or redirected) if the API is not hosted in the OSDU Admin Region.
- “OSDU In-Country Region”: When public cloud infrastructure cannot be leveraged to deploy an OSDU Region for data sovereignty or latency reasons, an in-country OSDU deployment takes place with the consideration of incurring least management and operating overhead. Public cloud edge capabilities will be leveraged to most possible extent to increase the consistency and lower the overhead. The closest “OSDU Standard Region” will be configured as a proxy for in-country region's Subsurface Work Product Services (SWPS) metadata represent files within the in-country OSDU Region and providing home region replication. List of OSDU Regions, and the identification of the Central and Admin OSDU Region will be part of the SMDS Master Data and will be administered as part of the Admin service.