OSDU comprises of Core Services, Domain Data Management Services, and number of Mandatory and Helper services. OSDU Core Services are used for data loading (ingestion and indexing) and data discovery. Mandatory Services are used to enforce additional attributes reflected in Architectural principles. OSDU Helper Services are used to implement cross-cutting non-mandatory functionalities of the various Domain Data Management Services. Domain Data Management Services themselves are formulated and managed by domains. Domains refer to broadly scoped business areas, such as well related workflows, seismic related workflows, etc.
This service covers how to remain compliant at the different stages of the data life-cycle inside OSDU. [COMMENT: Let's be very clear as to what the context and purpose of compliance is. Are we ensuring the compliance with OSDU 'rules' of the users / apps that interface with OSDU? Revise to be clear. When is this service called? By what? What are the consequences of failure?]
- When loading, ingesting, and indexing data
- While data inside OSDU is used
- When consuming data
Entitlements SAuth Service Identity
The Service Identity feature of SAuth allows service-to-service authentication. SAuth will issue a Service Identity token only to services allowed to call OSDU services. This token could be then used to call any of the OSDU services. The following are the primary use cases of SAuth Service Identity:
- Service-to-service authentication
- Operations which do not involve original user (cron, or sync)
- Long running operations, which can be executed on behalf of user [COMMENT: Check this!]
Entitlements service is used to enable authorization in OSDU. It relies on [SUBSTITUE:]Google groups as an infrastructure that ties into [SUBSTITUE:]Google IAM making native authorization a possibility. The service allows for the creation of [SUBSTITUTE:]Google groups. A group name defines a permission. Users who are added to that group obtain that permission. The main motivation for entitlements service is data authorization but the functionality enables three use cases:
- Data groups used for data authorization e.g. data.welldb.viewer, data.welldb.owner
- Service groups used for service authorization e.g. service.storage.user, service.storage.admin
- User groups used for hierarchical grouping of user and service identities e.g. users.datalake.viewers, users.datalake.editors
For each group you can either be added as an OWNER or a MEMBER. The only difference being if you are an OWNER of a group, then you can manage the members of that group. [COMMENT: Check this.]
Frame Of Reference
For all the existing records in OSDU, we need a way for the user to retrieve the data so that it is rationalized to common standard allowing for comparison from multiple data sources. For this, we currently support a few ways of supplying the frame of reference(FoR). The FoR provides the context in which the numbers are to be interpreted. With FoR properly supplied, we can then use the FoR to convert to a standard format, which is units in SI, crs in wgs84, elevation in msl, azimuth in true north, dates in utc.
The Indexer API provides a mechanism for indexing documents that contain structured or unstructured data. Documents and indices are saved in a separate persistent store optimized for search operations. The indexer API can index any number of documents.
The indexer indexes attributes defined in the schema. Schema can be created at the time of record ingestion in Data Ecosystem via Storage Service. The Indexer service also adds number of Data Ecosystem meta attributes such as id, kind, parent, acl, namespace, type, version, legaltags, index to each record at the time of indexing.
The Search API provides the ability to search the records loaded via the storage service and indexed with the indexer service (above). You can search an index, and organize and present search results. Documents and indexes are saved in a separate persistent store optimized for search operations.
The API supports full text search on string fields, range queries on date, numeric or string fields etc. along with geo-spatial search.
After performing the basic user management procedures (create users and groups, assign users to groups, etc.) through Entitlements Service, a developer can use the Data Ecosystem Storage Service to ingest metadata information generated by an applications into the Data Ecosystem. The Storage Service provides a set of APIs to manage the entire metadata life-cycle such as ingestion (persistence), modification, deletion, versioning and data schema.
The main goals of the Spatial Reference Catalog service are to offer
- Coordinate Reference Systems (CRSs) to enable end-users to make a CRS selection.
- Search for CRSs given a number of constraints.
- Download of the entire catalog for local caching and when the cache has to be refreshed.
- Access to various sub-sets of the catalog.
- Once a CRS is found, produce a persistent reference that can be stored with data or metadata, as appropriate, which fully describes the CRS; this persistent reference string becomes catalog independent. This means any consumer will be able to understand the CRS definition even if a different catalog is used in the future context.[This is a friction/decision point! In general, denormalization is not a good practice.]
This service provides spatial reference conversions for coordinates. Coordinates are represented by an array of 3D points. The context, i.e. the measurement and unit associated with the axes, is given by the CRS definitions. In most of the cases, the CRS definition is 2D. In both the geographic and projected CRS types, the Z-axis is passed through unchanged, and its unit is only known to the client. [COMMENT: Check these statements for validity and workability.]
The Unit service provides dimension/measurement and unit definitions. Given two unit definitions, the service also offers conversion parameters in two different parameterizations.
Data Catalog Service
The main goals of the Data Catalog Service are to create configuration files and records that are required in ingestion and enrichment workflow.
The main goals of this Ingestion Service are
- Ingest the data into data ecosystem
- Enrich the data and create Well Known Entities (WKE)
- Ingest the WKE back to data ecosystem
- Get status of the submitted job
Ingestion Service Status
Return the status of an Ingestion and Enrichment Job
Document Indexing Service
This service indexes the unstructured content which is currently present in the form of documents - pdf, tiff, doc, etc. This helps in searching for any keywords, phrases with in documents.
Document Query Service
Unstructured Query service is designed to query the contents of the documents which are indexed using the Document Indexing Process. This service can be used to search across all the indexed documents as well as within a certain document to find the pages where the searched keywords are appearing. The search results are ranked on the basis of the relevance of the search keywords. This service can also be used to get the images, thumbnails, annotation details of any Page of the indexed documents.
Enrichment Attribute Catalog Service
The attribute catalog contains models that define schema for various sources and entity types, as well as mappings between sources and a common index schema. This mapping unlocks a vast array of valuable workflows such as repeatable ingestion, creation of merged entities, relationships between entities, quick and easy discovery and consumption across sources.
Enrichment Fetch Service
A WKE, or Well Known Entity, is a digital representation of a physical, real-world object with the primary objective of building an enriched set of attributes for that object pooled in from various sources. In DELFI, this digital object is called an entity (eg. a well, a completion, a log curve, a well trajectory, etc.). Another function of the WKE is to serve as the anchor point for relationships to other related entities from one or more sources. The WKE for an entity is continually refreshed and should be the point of consumption for all workflows.
Enrichment Merge Service
Merge is the process where attributes from multiple raw sources of an entity type are matched and merged in order to create a 'Well Known Entity', or WKE. This API triggers a merge job by providing the inputs needed for the merge process. There is also an API to get the status of the merge jobs.