# OSDU® Data® Platform - Data Loading Quick Start Guide
---
title:OSDU® Data Platform Data Loading Quick Start Guide
---
# OSDU® Data Platform - Data Loading Quick Start Guide
## Contents
## Contents
...
@@ -14,34 +17,34 @@ These are the important links/documents that, you should first try to read and b
...
@@ -14,34 +17,34 @@ These are the important links/documents that, you should first try to read and b
1.[OSDU Schema Usage Guide](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Guides/README.md) - This is the latest schema usage guide that you should be familiar with to better understand the data schema structure.
1.[OSDU Schema Usage Guide](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Guides/README.md) - This is the latest schema usage guide that you should be familiar with to better understand the data schema structure.
2.[OSDU Data Definitions](https://community.opengroup.org/osdu/data/data-definitions) - This page contains the latest data definitions schema and reference values.
2.[OSDU Data Definitions](https://community.opengroup.org/osdu/data/data-definitions) - This page contains the latest data definitions schema and reference values.
3.[OSDU Core Services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview) - This contains the latest core services API and documentation that the OSDU platform supports.
3.[OSDU Core Services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview) - This contains the latest core services API and documentation that the OSDU® platform supports.
## Terms and Acronyms
## Terms and Acronyms
| Term | Description |
| Term | Description |
|------|-------------|
|------|-------------|
| Airflow | Airflow is the designated workflow engine for OSDU. Airflow is used to schedule and orchestrate the different workflows in OSDU for data flow. Best practices can be found [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/wikis/Ingestion-DAG-Best-Practices). |
| Airflow | Airflow is the designated workflow engine for OSDU®. Airflow is used to schedule and orchestrate the different workflows in OSDU® for data flow. Best practices can be found [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/wikis/Ingestion-DAG-Best-Practices). |
| Manifest | A manifest is a container specifically designed for facilitating metadata into the OSDU platform. As of this writing, a Manifest has structures to support holding metadata records of the following types: Reference Data Master Data Work Product Work Product Component Datasets |
| Manifest | A manifest is a container specifically designed for facilitating metadata into the OSDU® platform. As of this writing, a Manifest has structures to support holding metadata records of the following types: Reference Data Master Data Work Product Work Product Component Datasets |
| Source Data | Source Data might be an Excel file, LAS/DLIS files, Seismic data, text files, data streams, databases, etc. One of the goals of OSDU is to support storing source data in its original format to preserve lineage. Metadata is created to allow this source data to remain in its original format yet remain searchable and discoverable within the platform. |
| Source Data | Source Data might be an Excel file, LAS/DLIS files, Seismic data, text files, data streams, databases, etc. One of the goals of OSDU® is to support storing source data in its original format to preserve lineage. Metadata is created to allow this source data to remain in its original format yet remain searchable and discoverable within the platform. |
| DDMS | Domain Data Management Services. This provides a single consistent set of APIs and methods to access the data objects regardless of the domain workflow. Here's a [list of the DDMS](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services) currently being developed. |
| DDMS | Domain Data Management Services. This provides a single consistent set of APIs and methods to access the data objects regardless of the domain workflow. Here's a [list of the DDMS](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services) currently being developed. |
## Overview
## Overview
The OSDU Data Platform is versatile and designed to support multiple data loading use cases. The approaches recommended in this document are meant to offer a perspective for data ingestion. This document does not intend to prescribe the only path to data ingestion, and the approach provided is illustrative of some of the platform capabilities. We encourage you to engage with the OSDU member community with questions and feedback.
The OSDU® Data Platform is versatile and designed to support multiple data loading use cases. The approaches recommended in this document are meant to offer a perspective for data ingestion. This document does not intend to prescribe the only path to data ingestion, and the approach provided is illustrative of some of the platform capabilities. We encourage you to engage with the OSDU® member community with questions and feedback.
This data loading guide attempts to describe the latest practices for ingesting data into the OSDU Data Platform. The contents are intended to be fast-changing and evolving as the data loading capabilities of the platform are always updating. Once the workflows are matured, it will then be updated and reflected on the official documentation.
This data loading guide attempts to describe the latest practices for ingesting data into the OSDU® Data Platform. The contents are intended to be fast-changing and evolving as the data loading capabilities of the platform are always updating. Once the workflows are matured, it will then be updated and reflected on the official documentation.
This document addresses end-to-end data loading from the perspective of the end-user, which in most cases is a member of the information management or data platform capabilities team. Hence, this guide assumes that this end-user has some basic technical knowledge regarding HTTP Web API, JSON data structure, and Python.
This document addresses end-to-end data loading from the perspective of the end-user, which in most cases is a member of the information management or data platform capabilities team. Hence, this guide assumes that this end-user has some basic technical knowledge regarding HTTP Web API, JSON data structure, and Python.
## Introduction
## Introduction
The Data Flow workstream covers the full end-to-end process of facilitating data into the OSDU platform. The Data Ingestion workstream is part of the Data Loading workstream.
The Data Flow workstream covers the full end-to-end process of facilitating data into the OSDU® platform. The Data Ingestion workstream is part of the Data Loading workstream.
* Loading – this workstream captures all the work necessary to ready the data for ingestion. Activities might include:
* Loading – this workstream captures all the work necessary to ready the data for ingestion. Activities might include:
* Data fetching and organization
* Data fetching and organization
* Data massaging and formatting
* Data massaging and formatting
* Manifest/Metadata creation
* Manifest/Metadata creation
* Loading source data to a landing zone or staging area
* Loading source data to a landing zone or staging area
* Ingestion – this workstream facilitates data into the OSDU Data Platform. There are several data ingestion services available.
* Ingestion – this workstream facilitates data into the OSDU® Data Platform. There are several data ingestion services available.
### Data Platform Overview (Simplified)
### Data Platform Overview (Simplified)
...
@@ -53,24 +56,24 @@ Below is a simplified overview of the data platform. There are more interactions
...
@@ -53,24 +56,24 @@ Below is a simplified overview of the data platform. There are more interactions
### Data Types
### Data Types
These are the main OSDU data types:
These are the main OSDU® data types:
1.[Reference Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data) - These are the standard naming for the data values. For example, the reference value for measured depth is MD and for elevation is ELEV. Whenever these values are being used, the reference data must be first loaded in the OSDU platform. There are 3 governance levels for the reference data:
1.[Reference Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data) - These are the standard naming for the data values. For example, the reference value for measured depth is MD and for elevation is ELEV. Whenever these values are being used, the reference data must be first loaded in the OSDU® platform. There are 3 governance levels for the reference data:
*[Fixed](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/FIXED) - Pre-determined by agreement in OSDU forum and shall not be changed. This allows interoperability between companies.
*[Fixed](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/FIXED) - Pre-determined by agreement in OSDU® Forum and shall not be changed. This allows interoperability between companies.
*[Open](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/OPEN) - Agreed by OSDU forum but companies may extend with custom values. Custom values shall not conflict with Forum values. This allows some level of interoperability between companies.
*[Open](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/OPEN) - Agreed by OSDU® Forum but companies may extend with custom values. Custom values shall not conflict with Forum values. This allows some level of interoperability between companies.
*[Local](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/LOCAL) - OSDU forum makes no declaration about the values and companies need to create their own list. This list does not benefit much from interoperability and agreed-upon values are hard to come by.
*[Local](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/LOCAL) - OSDU® Forum makes no declaration about the values and companies need to create their own list. This list does not benefit much from interoperability and agreed-upon values are hard to come by.
2.[Master Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/master-data) - A record of the information about business objects that we manage in the OSDU record catalog. For example, a list of field names with well names and their associated wellbore names.
2.[Master Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/master-data) - A record of the information about business objects that we manage in the OSDU® record catalog. For example, a list of field names with well names and their associated wellbore names.
3.[Work Product](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product) - A record that ties together a set of work product components such as a group of well logs inside a wellbore.
3.[Work Product](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product) - A record that ties together a set of work product components such as a group of well logs inside a wellbore.
4.[Work Product Components](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product-component) - A record that describes the business content of a single well log, such as the log data information, top, bottom depth of the well log.
4.[Work Product Components](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product-component) - A record that describes the business content of a single well log, such as the log data information, top, bottom depth of the well log.
* Here is the [list](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/E-R/work-product-component#supported-bulk-standards) of the supported bulk standards in OSDU.
* Here is the [list](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/E-R/work-product-component#supported-bulk-standards) of the supported bulk standards in OSDU®.
5.[File](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/dataset) - A record that describes the metadata about the digital files, but does not describe the business content of the file, such as the file size, checksum of a well log.
5.[File](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/dataset) - A record that describes the metadata about the digital files, but does not describe the business content of the file, such as the file size, checksum of a well log.
### Getting the data into the OSDU Data Platform
### Getting the data into the OSDU® Data Platform
Once you are familiar with the OSDU data types, you must understand that there are several [data flow services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview#data-flow-services-and-dags) to bring the data into the OSDU data platform. There are pros & cons in each approach as detailed in each section link below.
Once you are familiar with the OSDU® data types, you must understand that there are several [data flow services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview#data-flow-services-and-dags) to bring the data into the OSDU® Data Platform. There are pros & cons in each approach as detailed in each section link below.
*[CSV Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser) – Schlumberger has developed an ingestion workflow capable of parsing a CSV file into a schema and loading each entry as a record into OSDU. There is future work to enrich the flattened schema structure created by the CSV parser into an R3-style schema.
*[CSV Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser) – Schlumberger has developed an ingestion workflow capable of parsing a CSV file into a schema and loading each entry as a record into OSDU®. There is future work to enrich the flattened schema structure created by the CSV parser into an R3-style schema.
*[Manifest-based Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags) – The manifest ingestion workflow leverages a manifest schema definition defined by the Data Definitions team to facilitate data into the OSDU Data Platform.
*[Manifest-based Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags) – The manifest ingestion workflow leverages a manifest schema definition defined by the Data Definitions team to facilitate data into the OSDU® Data Platform.
*[WITSML Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser) – Energistics have created an ingestion workflow capable of parsing WITSML into R3 schema formats.
*[WITSML Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser) – Energistics have created an ingestion workflow capable of parsing WITSML into R3 schema formats.
The fundamental idea in each of these ingestion methods is to trigger [storage service API](https://community.opengroup.org/osdu/platform/system/storage) to create the records. Alternatively, one can also run the storage service API directly to create the records but note that this approach is very forgiving and could lead to unexpected behavior.
The fundamental idea in each of these ingestion methods is to trigger [storage service API](https://community.opengroup.org/osdu/platform/system/storage) to create the records. Alternatively, one can also run the storage service API directly to create the records but note that this approach is very forgiving and could lead to unexpected behavior.
...
@@ -82,22 +85,22 @@ Below is a high level overview of the ingestion services available:
...
@@ -82,22 +85,22 @@ Below is a high level overview of the ingestion services available:
### Pre-requisite
### Pre-requisite
This guide assumes you have access to a working OSDU environment, please contact your cloud service provider for access.
This guide assumes you have access to a working OSDU® environment, please contact your cloud service provider for access.
### Steps
### Steps
In this quickstart guide, we will use the [open-test-data](https://community.opengroup.org/osdu/data/open-test-data) to demonstrate the steps above. In this example, we describe one of the three methods described above - Manifest-based Ingestion.
In this quickstart guide, we will use the [open-test-data](https://community.opengroup.org/osdu/data/open-test-data) to demonstrate the steps above. In this example, we describe one of the three methods described above - Manifest-based Ingestion.
***Manifest-based Ingestion**
***Manifest-based Ingestion**
1. Load reference data in the OSDU data platform
1. Load reference data in the OSDU® Data Platform
* For TNO example, the reference data [manifests](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/reference-data) should first be loaded into the OSDU platform.
* For TNO example, the reference data [manifests](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/reference-data) should first be loaded into the OSDU® platform.
* Any other missing reference data can be found in the OSDU community [reference data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data). This repository is maintained by the Data Definitions team.
* Any other missing reference data can be found in the OSDU® community [reference data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data). This repository is maintained by the Data Definitions team.
2. Prepare the master/WPC data manifests JSON
2. Prepare the master/WPC data manifests JSON
* Here is a set of [Python data preparation scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) to help with the manifest generation.
* Here is a set of [Python data preparation scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) to help with the manifest generation.
* You can either [learn](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-generate-manifests-using-scripts) to generate them from scratch with the [scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) or use to ones that have been [generated](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances) from the scripts.
* You can either [learn](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-generate-manifests-using-scripts) to generate them from scratch with the [scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) or use to ones that have been [generated](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances) from the scripts.
3. Load master/WP/WPC data in the OSDU data platform
3. Load master/WP/WPC data in the OSDU® Data Platform
* Send a POST request to `{OSDU_BASE_URL}/api/workflow/v1/workflow/Osdu_ingest/workflowRun` with the manifest JSON in the request body to trigger the workflow ingestion service as shown in an example below:
* Send a POST request to `{OSDU_BASE_URL}/api/workflow/v1/workflow/Osdu_ingest/workflowRun` with the manifest JSON in the request body to trigger the workflow ingestion service as shown in an example below:
```json
```json
...
@@ -229,7 +232,7 @@ This section runs through the common tasks in data loading and ingestions. Refer
...
@@ -229,7 +232,7 @@ This section runs through the common tasks in data loading and ingestions. Refer
6.[How to load a seismic data via Seismic DDMS - SEGY to oZGY](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
6.[How to load a seismic data via Seismic DDMS - SEGY to oZGY](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
7.[How to load a seismic data via Seismic DDMS - SEGY to oVDS (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/blob/master/docs/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
7.[How to load a seismic data via Seismic DDMS - SEGY to oVDS (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/blob/master/docs/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
8.[How to load a WITSML data (WITSML Parser)](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-WITSML-data-(WITSML-Parser)) - by Kateryna Kurach [EPAM]
8.[How to load a WITSML data (WITSML Parser)](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-WITSML-data-(WITSML-Parser)) - by Kateryna Kurach [EPAM]
9.[How to load a generic file in OSDU using File/Dataset service](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-generic-file-in-OSDU-using-File-or-Dataset-Service) - by Kateryna Kurach [EPAM]
9.[How to load a generic file in OSDU® using File/Dataset service](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-generic-file-in-OSDU-using-File-or-Dataset-Service) - by Kateryna Kurach [EPAM]
9.[How to check for error in Airflow Dag](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-check-for-error-in-Airflow-DAG) - by Chad Leong [SLB]
9.[How to check for error in Airflow Dag](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-check-for-error-in-Airflow-DAG) - by Chad Leong [SLB]
10.[How to search for ingested record](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md)
10.[How to search for ingested record](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md)
11.[Troubleshooting Index Status of Data Ingested](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/Troubleshooting-Index-Status-of-Data-Ingested) - by Samiullah Ghousudeen [BP]
11.[Troubleshooting Index Status of Data Ingested](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/Troubleshooting-Index-Status-of-Data-Ingested) - by Samiullah Ghousudeen [BP]