Update OSDU API Quick start guide authored by Chad Leong's avatar Chad Leong
- [Purpose of the document](#purpose-of-the-document)
- [GC vs baremetal deployment](#gc-vs-baremetal-deployment)
- [Configuring a Postman environment](#configuring-a-postman-environment)
- [Working with OSDU](#working-with-osdu)
- [Schema creation](#schema-creation)
- [Example: Schema creation](#schema-creation)
- [Example: Record creation with kind](#example-record-creation-with-kind)
- [Legal Tag creation](#legal-tag-creation)
- [Example: Legal Tag creation](#example-legal-tag-creation)
- [Access Control Lists (ACL) configuration](#access-control-lists-acl-configuration)
- [Example: A new data group creation](#example-a-new-data-group-creation)
- [Ingestion](#ingestion)
- [Data Ingestion](#data-ingestion)
- [Example: Data ingestion using Cloud Tools](#example-data-ingestion-using-cloud-tools)
- [Example: Data Ingestion using File Service API](#example-data-ingestion-using-file-service-api)
- [Metadata ingestion](#metadata-ingestion)
- [Example: Metadata Ingestion using File Service API](#example-metadata-ingestion-using-file-service-api)
- [Example: Metadata Ingestion using Dataset Service API](#example-metadata-ingestion-using-dataset-service-api)
- [Example: Metadata Ingestion using Manifest-based ingestion](#example-metadata-ingestion-using-manifest-based-ingestion)
- [Search and Metadata retrieval](#search-and-metadata-retrieval)
- [Example: Search for a metadata record by id using Storage Service API](#example-search-for-a-metadata-record-by-id-using-storage-service-api)
- [Example: Search for a metadata record by id using Search Service API](#example-search-for-a-metadata-record-by-id-using-search-service-api)
- [Download the original file](#download-the-original-file)
- [Example: Get file DownloadUrl using File Service API](#example-get-file-downloadurl-using-file-service-api)
- [Next Steps](#next-steps)
We are moving to this documentation to this new location
### Purpose of the document
The document goals are:
1. to introduce you to the basic OSDU functionality,
2. to help configure a Postman environment to do initial testing,
3. to explain the logic of included Postman requests.
This document is not intended to introduce you to all OSDU services. For full OSDU documentation please use [the link](https://osduforum.org/getting-started/osdu-documentation/)
### GC vs baremetal deployment
This guide is for GC deployment. If you are on baremetal please refer to [this guide](https://community.opengroup.org/osdu/documentation/-/wikis/OSDU-baremetal-QSG) to configure Postman and manage users and their permissions. Then continue with [Working with OSDU](#working-with-osdu) section.
### Configuring a Postman environment
OSDU uses the Postman tool to do majority of API testing. Here are the pre-requisites and the steps you need to perform to configure Postman environment.
_Pre-requisites:_
1. Deployed OSDU on GCP deployed with [examples](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-gcp-provisioning/-/tree/release/0.23/examples/simple_osdu) (for release **M20/v0.23**) or newer ones
2. A valid Google account that is used during OSDU deployment (used as the "admin_user_email" Terraform variable)
_Steps:_
1. **Download Postman environment file for the OSDU installation:**
Since release M19/v0.22 GC installation comes with `config` service that allows download of Postman environment file. Simply use link `https://<your_domain>/api/config/v1/postman-environment` and save the page.
2. **Download Postman**
Download and install Postman (for example the [link](https://www.postman.com/downloads) may be used).
3. **Import the environment variables file into Postman**
![Image2 – Postman. Import of environment variables](uploads/921b240be55455b4fca5010337962ce1/image003.png)
Import previously downloaded _gcp-.postman_environment.json_ to Postman Environments.
### Grant permissions for users
An OSDU admin (an email address that was specified as "admin_user_email" during Terraform deployment) grants Entitlements permissions for users who send requests using Postman. These users should be added into the following Entitlements groups:
- users
- users.datalake.viewers
The instruction for granting permissions is available by [the link](https://gitlab.opengroup.org/osdu/pmc/home/-/wikis/Releases/R3.0/GCP/GCP-Operation/User-Mng/User-Management).
### Set a value for **_refresh_token_**
You have several options to obtain `refresh_token`. But in all cases you need to have CLIENT_ID and CLIENT_SECRET and set them in Postman environment.
- For non-production purposes you could use following values:
```text
CLIENT_ID=605457357143-6h6uqunq67f53m9jeibn38gupd27bsfb.apps.googleusercontent.com
CLIENT_SECRET=RZsmDE6ebJ7OGu5Csvz151JA
```
- For production environments or due to other restrictions ask “Owner” or “Editor” of the GoogleCloud project where OSDU is deployed to create a client that is planned to use (OAuth configuration is out of scope of this document)
- Copy-paste CLIENT_ID and CLIENT_SECRET into the environmental variables in Postman. Make sure you are updating both `INITIAL VALUE` and `CURRENT VALUE`
![Image 5. Updating CLIEND_ID and CLIENT_SECRET in the Postman environment variables](uploads/f984c539a3ade847147ad3872a171e7c/image009.png)
#### Option 1 (Simple): Obtaining `refresh_token` via Postman UI
Refer to [Postman authentication guide](https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M16/GCP-M16/Postman_Authentication_Guide.md).
Open [Quick start](https://community.opengroup.org/osdu/documentation/-/wikis/uploads/3c9820ad7dfe9ed873755c53500afe4e/OSDU_Quick_start.postman_collection.json) or other Postman collection and go to the Authorization tab:
![auth tab](./uploads/baremetal-qsg-img/kk-postman-auth-1.png)
Verify that following variables are set in Postman enviroment and this environment is selected as active (use dropdown in top right corner):
```text
callback_url=https://developers.google.com/oauthplayground
auth_url=accounts.google.com/o/oauth2/auth
Token_Fetch_URL=https://oauth2.googleapis.com/token
Scope=email openid profile
```
Set `Type=OAuth 2.0` and scoll down to `Configure New Token` and press `Edit token configuration`.
Set
```text
Token Name=<any name you like>
Grant Type=Authorization Code
Callback URL={{callback_url}}
Auth URL=https://{{auth_url}}?access_type=offline&prompt=consent
Access Token URL={{Token_Fetch_URL}}
Client ID={{CLIENT_ID}}
Client Secret={{CLIENT_SECRET}}
Scope={{Scope}}
```
![auth](./uploads/qsg-img/postman-auth.png)
Press `Get New Access Token`.
In opened window enter your Google account email
![login](./uploads/qsg-img/postman-auth-login.png)
and password on next step
![password](./uploads/qsg-img/postman-auth-password.png)
Allow access on next step:
![allow access](./uploads/qsg-img/postman-auth-access.png)
In `Manage access tokens` window scroll down to `refresh_token` section, select this value and copy it. Than paste in your Postman environent:
![tokens](./uploads/qsg-img/postman-tokens.png)
Press `Use Token` button. Your just created token become active and switch `Auto-refresh token` id `On`
![current token](./uploads/qsg-img/postman-current_token.png)
Save the environment and collection.
Now you just need to ensure that `Authorization` for all requests and folders in collection are set to `Inherit auth from parent` and Postman will refresh token for you when needed
![inherit auth](./uploads/qsg-img/postman-auth-inherit.png)
You could still use `Refresh Token` requests from Postman collection to get new access tokens manually.
#### Option 2 (Legacy): Obtaining `refresh_token` via Google OAuth 2.0 Playground
- Open [OAuth 2.0 Playground](https://developers.google.com/oauthplayground)
- In Step 1 “Select and authorize APIs” expand the “Google OAuth2 API v2” section and select all 3 values (image 3):
- https://www.googleapis.com/auth/userinfo.email
- https://www.googleapis.com/auth/userinfo.profile
- openid
- Click on “OAuth 2.0 Configuration” icon and select “Use your own OAuth credentials” (image 3)
![image005](uploads/4c4e0f7a8f3d8af88fe949ec00d8be80/image005.png)
- Paste CLIENT_ID and CLIENT_SECRET from above step to OAuth 2.0 Playground screen and click “Authorize APIs”.
![Image 7.OAuth 2.0 Playground. Updating Client ID and Client Secret](uploads/a2d6f51dc38ee820b5701c1f27138528/image013.png)
- Select a user account that can be authorized on the next screen - Click “Exchange authorization code for tokens” (Image 8)
![Image 8. OAuth 2.0 Playground. Exchange authorization code for tokens](uploads/ea2fdfb596224d002bdfe460626595e7/image015.png)
- Copy `refresh token` into the Postman environment variables files (both `INITIAL VALUE` and `CURRENT VALUE`) (Image 9)
![Image 9. OAuth 2.0 Playground. Refresh token](uploads/9d8496589ec9cd47da4833ac61e72745/image017.png)
Remember that **the tokens should be refreshed in Postman every 30 minutes.**
The next chapters will briefly explain several simple and widely used scenarios. For full details on how different OSDU services work, please refer to documentation of a specific OSDU service.
### Working with OSDU
### Schema creation
Schema - A data model definition. In other words, the schema defines whether a given field in the record is a string, or an integer, or a float, or a geopoint, etc. Schema Service allows data models to be defined in rich JSON objects.
#### Example: Schema creation
```json
POST https://{{SCHEMA_HOST}}/api/schema-service/v1/schema
{
"schemaInfo": {
"schemaIdentity": {
"authority": "SchemaSanityTest",
"source": "testSource",
"entityType": "{{entityType}}",
"schemaVersionMajor": 1,
"schemaVersionMinor": 1,
"schemaVersionPatch": 0,
"id": "{{SchemaID}}"
},
"status": "PUBLISHED"
},
"schema": {
"$schema":"http://json-schema.org/draft-07/schema#",
"x-slb-lifecycle-state":"published",
"description":"Theentitywell.",
"title":"Well",
"type":"object",
"definitions":{
},
"properties":{
"locationOriginalCRS":{
"description":"Thewell'soriginallocationasAnyCrsFeatureCollection-astructuresimilartobutdistinctfromGeoJSON.",
"title":"OriginalCRSLocation"
},
"locationWGS84":{
"description":"Thewell'slocationasGeoJSONFeatureCollection.",
"title":"WGS84Location",
"$ref":"https://geojson.org/schema/FeatureCollection.json",
"example":{
"features":[
{
"geometry":{
"coordinates":[
-92.11569999999999,
29.8823,
153.4779442519685
],
"type":"Point"
},
"type":"Feature",
"properties":{
"name":"Newton2-32"
}
}
],
"type":"FeatureCollection"
}
}
}
}
}
```
_Response:_
```json
{
"schemaIdentity": {
"authority": "SchemaSanityTest",
"source": "testSource",
"entityType": "testEntity_59047",
"schemaVersionMajor": 1,
"schemaVersionMinor": 1,
"schemaVersionPatch": 0,
"id": "SchemaSanityTest:testSource:testEntity_59047:1.1.0"
},
"createdBy": "denis_karpenok@epam.com",
"dateCreated": "2022-09-09T15:31:56.000+00:00",
"status": "PUBLISHED",
"scope": "INTERNAL"
}
```
Schemas and records are tied together by the **kind** attribute - **kind** of data being ingested. **Kind** value is schema id and must follow the naming convention: _{Schema-Authority}:{dataset-name}:{record-type}:{version}_. On top of that, a given kind can have zero or exactly one schema associated with.
#### Example: Record creation with kind
```json
POST https://{{SCHEMA_HOST}}/api/schema-service/v1/schema
[
{
"acl": {
"owners": [
"{{New_OwnerDataGroup}}@{{data-partition-id}}{{domain}}"
],
"viewers": [
"{{New_ViewerDataGroup}}@{{data-partition-id}}{{domain}}"
]
},
"data": {
"msg": "hello world from Data Lake"
},
"id": "{{data-partition-id}}:{{record_id}}:{{NewWellUWI}}",
"kind": "SchemaSanityTest:testSource:testEntity_59047:1.1.0",
"legal": {
"legaltags": [
"{{LegalTagNameExists}}"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"meta": [
{}
],
"version": 0
}
]
```
### Legal Tag creation
Legal Tags are needed for OSDU metadata access control (more information on the metadata ingestion is later in this document).
To create a Legal Tag, Legal service should be used.
#### Example: Legal Tag creation
```json
POST https://{{LEGAL_HOST}}/legaltags
{
"name": "{{data-partition-id}}-demo-legaltag",
"description": "Legal Tag added for Well",
"properties": {
"contractId": "123456",
"countryOfOrigin": [
"US",
"CA"
],
"dataType": "Third Party Data",
"exportClassification": "EAR99",
"originator": "Schlumberger",
"personalData": "No Personal Data",
"securityClassification": "Private",
"expirationDate": "2025-12-25"
}
}
```
### Access Control Lists (ACL) configuration
ACLs are needed to manage user access to the specific metadata record. They define a data group that has “owners” or “viewers” permissions over this metadata record.
If the deployment process and the Entitlements service bootstrapping are performed properly, all the users can be included into 2 groups:
- data-default-viewers
- data-default-owners
However, it's possible to create a new data group that would allow only specific users to view or modify records.
#### Example: A new data group creation
```json
POST https://{{ENTITLEMENTS_HOST}}/groupshttps://{{ENTITLEMENTS_HOST}}/groups
{
"name":"data-group-test",
"description": "Certification Test Data group with Viewer access"
}
Response:
{
"name": "data-group-test",
"email": "data-group-test@odesprod.osdu-gcp.96888go3-nrg.projects.epam.com",
"description": "Certification Test Data group with Viewer access"
}
Add a new user to the created group
POST https://{{ENTITLEMENTS_HOST}}/groups/{{group name}}/members
{
"email": "{{user email}}",
"role": "MEMBER"
}
```
### Ingestion
It is easier to start explaining OSDU functionality with explanation on how to ingest data into OSDU.
There are 2 types of data ingestion in OSDU:
1. **Data (files) ingestion** — any file types.
2. **Metadata ingestion** - data that describes the ingested file to make it searchable
Metadata is described by the [JSON schemas](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated) created by the OSDU Data Definition team.
#### _Data Ingestion_
![image019](uploads/68cf061e56f57e4106feadba00db5175/image019.png)
<div>Image 10. Data (Files) ingestion \\ download</div>A user can choose 1 of 4 ways to upload data into OSDU.
Let’s see how we can ingest Raster Well Log file in .tif format into OSDU. Please note that you can ingest a file of any format.
Part of the file:
![image021](uploads/3aabfc7977d5277b9cf0cee769b595c3/image021.png)
<div>Image 11.An example of the image data</div>
### Example: Data ingestion using Cloud Tools
Users can access OSDU Storage Service using cloud tools and just upload needed files there. Here is an example of how to do this on Google Cloud (GC). This way is not supported in baremetal deployment.
**Step 1. Load file(s)**
The Storage Service has a specified bucket in the OSDU environment, and a user can directly upload a file into this bucket. **<span dir="">_TBD – how to get the name of the bucket (to delete?):_</span>**
![image023](uploads/c96b9c2bb31aacf0d454d7272149024e/image023.png)
<div>Image 12. Loading files using Cloud Storage</div>
**Step 2. Obtain File DownloadURL**
After loading a file into OSDU Storage Service bucket, user should obtain DownloadURL. It is a File Location in the Storage Service:
![image025](uploads/186814c016e42c07201c58fc12041d12/image025.png)
<div>Image 13. Cloud Storage. Download URL</div>This URL will be used later during metadata ingestion.
Please note that if you use cloud tools to ingest a file, metadata record in OSDU is not created. This file is not yet discoverable.
#### Example: Data Ingestion using File Service API
Files can be loaded using File Service API.
**Step 1. Get File upload SignedURL**
_Request:_
```json
GET {{FILE_HOST}}/files/uploadURL
```
_Response:_
```json
{
"FileID": "aa51c2dc6e7a48979d9fd29b6e421218",
"Location": {
"SignedURL": "https://storage.googleapis.com/osdu-data-prod-staging-area/260d83ed-9322-4f78-9a19-92c75bf23c36/aa51c2dc6e7a48979d9fd29b6e421218?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=osdu-gcp-sa%40osdu-service-prod.iam.gserviceaccount.com%2F20210415%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210415T183926Z&X-Goog-Expires=604800&X-Goog-SignedHeaders=host&X-Goog-Signature=64f920b9d65d3365cfbf33ee75012874d2e476d235329327b9fa3ae8a431a4e112220deb9313bd2d870a02f878d88abaf4dc405585e9da9a0135ca9b0fdc1359ba3018f78c2691bcde25141a3c6961d24bd9727d7c1fbc3c4232ce763f57df5b631986e1c22adc73ac20c81ce4e1a91be0b2ef4fa10c8d6604b3817a3578d3c0176832eb8e5f9611e6b5b17b5c7014b67d12a8a0a9f4704411f968ceb6dde66d588fa17889cd9a9a02f62be44bd9b57d51a1a672a9a0643524628d5e8b6b12857a06e4fe3f9bd349076a4c8f83530e676399d890ca09114143f286d121432d2f4464135e1dc84a58932070b9b62fbbb97dc5d2a5e8eaa0183874c142f17e15c3",
"FileSource": "/260d83ed-9322-4f78-9a19-92c75bf23c36/aa51c2dc6e7a48979d9fd29b6e421218"
}
}
```
Parameter “SignedURL” is stored as {{upload_signed_url}} variable.
Parameter “FileSource” is stored as {{file_source}} variable.
**Step 2. Upload your file**
Request:
```http
PUT {{upload_signed_url}}
```
Response:
```http
Status: 200 OK
```
As a result of this step File is moved into OSDU Staging zone (Compute area).
### Metadata ingestion
To make data (ingested files) discoverable (searchable by OSDU Search service) metadata should be stored to OSDU.
All metadata that can be stored into OSDU is described by [OSDU schemas](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated).
The OSDU metadata can be one of several types:
![image019](uploads/8247094e38410a0a95c6ee300d7243a4/image019.png)
<div>Image 14. OSDU Data Platform core concepts and Group Types</div>In simple words:
- **Master data** – slow-changing o&g entities, this data doesn’t have association with the specific file. Full list of OSDU supported Master data can be found [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/master-data).
- **Work Product Component (WPC)** – metadata associated with the specific file or file collection. It is changing in time data, thus WPC may have different versions, each version describing the snapshot of the data at a certain point of time. WPC must have a reference to one or more Dataset records that describe the file or file collection. Full list of supported WPCs can be found [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated/work-product-component).
- **Reference data** – think about it as a drop-down values to describe Master or WPC attributes values (e.g. units of measures, list of countries, etc.). By governance type reference data can be:
- fixed (values are provided by OSDU
- open (values are provided by OSDU, but an operating company can add their own values)
- local (schema is provided by OSDU, but values should be provided by an operating company).
See the full list of reference data schemas with their governance type [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/E-R/reference-data).
- **Dataset** – schemas that describe properties of a physical file (this file does not describe the logical entity!). They can describe files of the specific type (e.g. .tiff, .segy) or any type (File.Generic kind). Each Dataset record must have a link to the physical file location that can be used to download the file from the OSDU Storage service. Schemas that describe file properties can be found [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/E-R/dataset).
- **Work Product (WP)** – think of it as a project or collection of related files and datasets. Work Product is an abstract entity that allows to organize different WPCs and datasets. You can have any number of WPCs and Datasets in your WP.
There are several ways of storing metadata into OSDU:
![image029](uploads/c526a4dd508bef50585cdbc5169c87f8/image029.png)
<div>Image 15. Metadata Ingestion</div>
#### Example: Metadata Ingestion using File Service API
**_Pre-requisites:_** Example: Data Ingestion using File Service API should be executed
**_Request:_**
```json
POST {{file_api_url}}/v2/files/metadata
{
"data": {
"Endian": "BIG",
"Description": "Example for Data Loading guide",
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "{{file_source}}",
"Name": "SLB Well Raster log",
"PreLoadFilePath": "",
"PreloadFileCreateUser": "Data Loading Team",
"PreloadFileModifyDate": "Apr 15 2021",
"PreloadFileModifyUser": "Data Loading Team"
}
},
"TotalSize": "13245217273",
"Source": "Example Data Source",
"Name": "Dataset X221/15"
},
"kind": "osdu:wks:dataset--File.Generic:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"createTime": "2021-02-22T18:50:47.498Z",
"modifyUser": "osdu-community-sa-airflow@nice-etching-277309.iam.gserviceaccount.com",
"modifyTime": "2021-02-22T21:13:10.587Z"
}
```
**_Make sure that correct values of the "acl" property are specified!_**
**_Make sure that correct values of the "legaltags" property are specified!_**
_The response:_
```http
Status 201 Created
{
"id": "odesprod:dataset--File.Generic:ce6fe9fd-ab46-4358-ae27-8631e6cf8ae4"
}
```
Please note that the id parameter is not specified in the request, and it was assigned automatically by the Storage Service. You may choose to specify the “id” parameter in case you want to assign a specific id.
As a result of this step the file is moved from the Staging area to the Persistent area and metadata record for the file is created.
### Example: Metadata Ingestion using Dataset Service API
Files and File Collections can be ingested into OSDU using Dataset Service.
1. [FILE](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/dataset/File.Generic.1.0.0.json)
```json
Request: PUT {{DATASET_HOST}}/api/dataset/v1/registerDataset
{
"datasetRegistries": [
{
"id": "{{data-partition-id}}:dataset--File.Generic:{{$guid}}",
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"data": {
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "{{file_source}}",
"PreLoadFilePath": ""
}
}
},
"meta": [],
"tags": {}
}
]
}
```
2. [The FileCollection](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Generated/dataset/FileCollection.Generic.1.0.0.json)
```json
Request: PUT {{DATASET_HOST}}/api/dataset/v1/registerDataset
{
"datasetRegistries": [
{
"id": "{{data-partition-id}}:dataset--File.Generic:{{$guid}}",
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"data": {
"DatasetProperties": {
"FileCollectionPath": "{{file_collection_path}}"
}
},
"meta": [],
"tags": {}
}
]
}
```
### Example: Metadata Ingestion using Manifest-based ingestion
Metadata can be ingested using OSDU Manifest-based ingestion mechanism. Depending on the data type included in your file, the corresponding OSDU Schema should be used.
All OSDU schemas can be found [here](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/tree/master/Generated).
In our example we are ingesting Well Log file, so we will use [the Well Log schema](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Generated/work-product-component/WellLog.1.0.0.json).
**Step 1. Create a mapping between the file that contains metadata information and an OSDU schema**
As was stated above, we are using OSDU Well Log schema. Metadata to populate this schema should be taken from the .tif file itself. Please note that sometimes metadata about the file may be sourced from another file or a database.
A mapping document between the .tif file and OSDU schema has to be created:
![image033](uploads/2e883e4c41c5fab00f1efcf136447524/image033.png)
<div>Image 16. A mapping document</div>In this case we have an excel spreadsheet with the mapping. Also, some of the schema attributes can be hardcoded:
![image035](uploads/fcada9d0bf68c088921f1d6bb0d9ef26/image035.png)
<div>Image 16. A mapping document</div>**Step 2. Create a manifest file**
The structure of the manifest is described by [the OSDU Manifest schema](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Generated/manifest/Manifest.1.0.0.json)
The Well Log schema is embedded into the Manifest schema.
You can populate manifest manually or using a custom script.
Here are some places in manifest that user should pay attention to:
1. Master data, Reference data, Work Product (WP), Work Product Component (WPC), Dataset ids
Depending on your use case, your organization may want to specify ids for WP, WPC and Dataset or use system-generated ids.
- Assigned ids: Use “id” parameter in the manifest following the pattern defined in the Manifest schema:
_Example:_
```json
"Data": {
"WorkProduct": {
"id": "{{data-partition-id}}:work-product--WorkProduct:20210330",
```
- System-generated ids:
For Master and Reference data ingestion, you just can skip “id” field in the manifest, in this case id value will be assigned automatically by the Storage Service.
For WP ingestion (e.g. Well Log) you need to use surrogate-keys. Surrogate-keys are defined as an “id” value for WP, WPC and Dataset according to the pattern defined in the Manifest schema. For example:
```json
"Data": {
"WorkProduct": {
"id": "surrogate-key:wp-1",
```
2. WPC and Dataset references
Please make sure that all WPCs are referenced in the corresponding WP block (all WPC ids are listed). Ids should end with ":" symbol:
```json
"WorkProduct": {
"id": "surrogate-key:wp-1",
"kind": "osdu:wks:work-product--WorkProduct:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"osdu-Well-Legal-Tag-Test6002007"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name": "69_D_CH_11",
"Description": "Document",
"Components": [
"surrogate-key:wpc-1"
]
}
},
```
Please make sure that all Datasets are referenced in the corresponding WPC block (all Dataset ids are listed). Ids should end with ":" symbol. For example:
```json
"WorkProductComponents": [{
"id": "osdu:work-product-component--WellboreMarkerSet:0d409294-effd-4103-bb89-b7fb7471c222",
"kind": "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name": "7587.csv",
"Description": "Wellbore Marker",
"Datasets": [
"osdu:dataset--FileCollection.Generic:feb26:"
],
```
3. Make sure that data is referenced correctly
If Master data, Reference data or WPC is referenced in the manifest, id should end with the : symbol (to reference the latest version of the record) or with : to reference a specific version of the record. For example:
```json
"FacilityID": "10110909",
"FacilityTypeID": "osdu:reference-data--FacilityType:Well:1614798473718776",
"FacilityOperator": [{
"FacilityOperatorID": "410464"
}],
```
4. Make sure that dataset record is created / enriched correctly
Depending on the way you chose to ingest file(s) in Part 1 of this document you may have a metadata record for a Dataset (if you used File Service or Dataset Service APIs) or you may not have a metadata record (you used Cloud Tools to load data).
- File metadata exists
If a file metadata was ingested during file loading, current metadata should enrich existing record. To do so, you need to specify the record id created in Part 1 as a dataset id in the manifest.
Make sure that you are also specifying correct FileSource parameter that references file location in the OSDU Storage Service.
Dataset metadata in a manifest is described by [the following schema..](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/master/Generated/manifest/GenericDataset.1.0.0.json)
```json
"Datasets": [{
"id": "{{data-partition-id}}:dataset--File.Generic:20213000000",
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"tags": {
"NameOfKey": "String value"
},
"createTime": "2020-12-16T11:46:20.163Z",
"createUser": "some-user@some-company-cloud.com",
"modifyTime": "2020-12-16T11:52:24.477Z",
"modifyUser": "some-user@some-company-cloud.com",
"ancestry": {
"parents": []
},
"meta": [],
"data": {
"Endian": "BIG",
"Description": "As originally delivered by ACME.com.",
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "gs://osdu-cicd-epam-persistent-area/c3af38c1-654d-47a0-a3e6-9e94c32add84/b62b104f843142f49ee6d747e6bdd49d:",
"Name": "MODIFIED 1234",
"PreLoadFilePath": "
"PreloadFileCreateUser": "user1",
"PreloadFileModifyDate": "mar 11",
"PreloadFileModifyUser": "mar 11"
}
},
"TotalSize": "13245217273",
"Source": "Example Data Source",
"Name": "Dataset X221/15"
}
}
]
```
- File metadata doesn’t exist
In this case minimal requirement for Dataset block of manifest file is to ingest correct DownloadURL obtained in Part 1 into FileSource parameter.
Example of the full manifest:
```json
"manifest": {
"kind": "{{data-partition-id}}:wks:Manifest:1.0.0",
"ReferenceData": [],
"MasterData": [],
"Data": {
"WorkProduct": {
"kind": "{{data-partition-id}}:wks:work-product--WorkProduct:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name": "Raster Well Log WP",
"Description": "Raster Well Log WP for ML POC",
"Components": [
"surrogate-key: wpc1"
]
}
},
"WorkProductComponents": [
{
"id": "surrogate-key: wpc1",
"kind": "{{data-partition-id}}:wks:work-product-component--WellLog:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"createUser": "GCP ML POC",
"data": {
"ResourceSecurityClassification": "{{data-partition-id}}:reference-data--ResourceSecurityClassification:Public:",
"Source": "Oklahoma public registry",
"Description": "Raster Well Log",
"SubmitterName": "GCP Ingestion team",
"WellLogTypeID": "{{data-partition-id}}:reference-data--LogType:Raw:",
"LogActivity": "Main Pass",
"LogRun": "1",
"WellboreId": "{{data-partition-id}}:master-data--Wellbore:3511023252:",
"BottomMeasuredDepth": 12660,
"LoggingService": "SLIM CEMENT MAP TOOL",
"LogVersion": "1",
"ActivityType": "Wireline",
"ServiceCompanyId": "{{data-partition-id}}:master-data--Organisation:Schlumberger:",
"Datasets": [
"surrogate-key: dataset1"
]
}
}
],
"Datasets": [
{
"id": "surrogate-key: dataset1",
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"createUser": "ML POC",
"data": {
"Description": "Raster Well Log",
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "gs://alexd-test-dataset/Demo/Demo/OKI.NW.13.13.04.XX.6B287FFB.D-N01_SLB_3501123504.tif"
}
}
}
}
]
}
}
```
**Step 3. Start the workflow**
_Request:_
```json
POST https://{{WORKFLOW_HOST}}/v1/workflow/Osdu_ingest/workflowRun
{
"executionContext": {
"Payload": {
"AppKey": "test-app",
"data-partition-id": "{{data-partition-id}}"
},
"manifest": {
"kind": "{{data-partition-id}}:wks:Manifest:1.0.0",
"ReferenceData": [],
"MasterData": [],
"Data": {
"WorkProduct": {
"kind": "{{data-partition-id}}:wks:work-product--WorkProduct:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"data": {
"Name": "Raster Well Log WP",
"Description": "Raster Well Log WP for ML POC",
"Components": [
"surrogate-key: wpc1"
]
}
},
"WorkProductComponents": [
{
"id": "surrogate-key: wpc1",
"kind": "{{data-partition-id}}:wks:work-product-component--WellLog:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"createUser": "GCP ML POC",
"data": {
"ResourceSecurityClassification": "{{data-partition-id}}:reference-data--ResourceSecurityClassification:Public:",
"Source": "Oklahoma public registry",
"Description": "Raster Well Log",
"SubmitterName": "GCP Ingestion team",
"WellLogTypeID": "{{data-partition-id}}:reference-data--LogType:Raw:",
"LogActivity": "Main Pass",
"LogRun": "1",
"WellboreId": "{{data-partition-id}}:master-data--Wellbore:3511023252:",
"BottomMeasuredDepth": 12660,
"LoggingService": "SLIM CEMENT MAP TOOL",
"LogVersion": "1",
"ActivityType": "Wireline",
"ServiceCompanyId": "{{data-partition-id}}:master-data--Organisation:Schlumberger:",
"Datasets": [
"surrogate-key: dataset1"
]
}
}
],
"Datasets": [
{
"id": "surrogate-key: dataset1",
"kind": "{{data-partition-id}}:wks:dataset--File.Generic:1.0.0",
"acl": {
"owners": [
"data.default.owners@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
],
"viewers": [
"data.default.viewers@{{data-partition-id}}.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"{{data-partition-id}}-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
]
},
"createUser": "ML POC",
"data": {
"Description": "Raster Well Log",
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "gs://alexd-test-dataset/Demo/Demo/OKI.NW.13.13.04.XX.6B287FFB.D-N01_SLB_3501123504.tif"
}
}
}
}
]
}
}
}
}
```
_Response:_
```json
{
"workflowId": "ef82cba0-0e45-4df3-91bf-4df1553102d3",
"runId": "b9a72077-1a2f-4998-83b5-406586e868fb",
"startTimeStamp": 1618601056388,
"status": "submitted",
"submittedBy": "kateryna_kurach@osdu-gcp.go3-nrg.projects.epam.com"
}
```
**Step 4. Check Airflow log**
A user needs to check Airflow log to validate what record ids from the manifest were processed and what record ids were not processed and the reason for rejection. Please keep in mind that your workflow status may be green (Successful), but some or all your resource ids were rejected.
To review the log, please navigate to your Airflow console and open osdu_ingest DAG:
![image037](uploads/2b9e6ec91a8f7b46a8cf549adfb867e3/image037.png)
<div>Image 18. Airflow. OSDU ingest DAG</div>Then for better usability navigate to Graph view:
![image039](uploads/f424d195eb5bf0d089572e5b02a57558/image039.png)
<div>Image 19. Airflow. OSDU ingest DAG. Tree View</div>Then select your dag run from the drop-down, click on Go and click on the last step in the dag execution flow and click “View log”:
![image041](uploads/f6d97b203a38143916f3da54ce05307e/image041.png)
<div>Image 20. Airflow. OSDU ingest DAG. Graph View</div>Then click on Xcom and this screen will show you what ids were processed and what ids were skipped with the reason:
![image043](uploads/f6a4424fa9f242856d02a2f2bf1d96d6/image043.png)
<div>Image 21. Airflow. OSDU ingest DAG. XCom</div>\\\*\\\*Step 5. Check inserted data\\\*\\\*
_Request:_
```http
GET {{storage_api_url}}/api/storage/v2/records/odesprod:work-product-component--WellLog:f8781f4e41b04b96b385d70a0f8a14bf
```
_Response:_
```json
{
"data": {
"Description": "Raster Well Log",
"WellboreId": "odesprod:master-data--Wellbore: 3511023252:",
"ActivityType": "Wireline",
"ServiceCompanyId": "{{data-partition-id}}:master-data--Organisation:Schlumberger:",
"Source": "Oklahoma public registry",
"Datasets": [
"surrogate-key: dataset1"
],
"LogVersion": "1",
"WellLogTypeID": "odesprod:reference-data--LogType:Raw:",
"LoggingService": "SLIM CEMENT MAP TOOL",
"ResourceSecurityClassification": "odesprod:reference-data--ResourceSecurityClassification:Public:",
"BottomMeasuredDepth": 12660,
"SubmitterName": "GCP Ingestion team",
"LogRun": "1",
"LogActivity": "Main Pass"
},
"id": "odesprod:work-product-component--WellLog:f8781f4e41b04b96b385d70a0f8a14bf",
"version": 1618601284459573,
"kind": "odesprod:wks:work-product-component--WellLog:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@odesprod.osdu-gcp.go3-nrg.projects.epam.com"
],
"owners": [
"data.default.owners@odesprod.osdu-gcp.go3-nrg.projects.epam.com"
]
},
"legal": {
"legaltags": [
"odesprod-demo-legaltag"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "osdu-sa-airflow-composer@osdu-service-prod.iam.gserviceaccount.com",
"createTime": "2021-04-16T19:28:04.556Z"
}
```
### Search and Metadata retrieval
You can retrieve a metadata record from OSDU using Storage service that provides a basic functionality to retrieve a metadata record by its id or you can use Search service that provides very flexible and powerful mechanism to search for metadata.
OSDU uses Elasticsearch to provide search capabilities. After ingestion, metadata is indexed, and you can search this index. Please refer to Lucene query syntax for full documentation on how to search for any property used in the OSDU schemas:
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
Please note that for the same record id a record returned by the Storage service and Search service will look differently. This is because Storage service returns an original record and Search service returns an indexed record.
See below a couple of simple search queries.
#### Example: Search for a metadata record by id using Storage Service API
```http
GET https://{{STORAGE_HOST}}/records/{{record-id}}
```
#### Example: Search for a metadata record by id using Search Service API
```json
POST https://{{SEARCH_HOST}}/query
{
"kind": "*:*:*:*",
"limit": 300,
"query": "id: \"{{record-id}}\""
}
```
### Download the original file
To get an original file, you may use similar services as were used for file ingestion.
First, you need to search for the needed metadata record of the dataset (see above). Then using dataset record id, you may need to request file DownloadUrl using File service, Dataset service or DDMSes.
#### Example: Get file DownloadUrl using File Service API
```http
GET {{FILE_HOST}}/files/{{record_id}}/downloadURL
```
After executing this request simply download the file using the provided link.
Postman collection file: [OSDU_Quick_start.postman_collection.json](uploads/3c9820ad7dfe9ed873755c53500afe4e/OSDU_Quick_start.postman_collection.json)
#### Next steps
- [OSDU API Quick start demo](https://gitlab.opengroup.org/osdu/pmc/docs/-/blob/master/Google%20Cloud/Quick_Start_Guide_demo.mp4)
- [create OSDU on-prem using Helm chart](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-gcp-provisioning/-/blob/master/examples/simple_osdu_onprem/README.md)
- [create OSDU set of services within a single Google Cloud project](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-gcp-provisioning/-/blob/master/examples/simple_osdu/README.md)
- [create OSDU set of service within Azure](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/README.md)
https://community.opengroup.org/osdu/documentation/-/wikis/OSDU-API-Quick-start-guide
\ No newline at end of file