title:OSDU® Data Platform Data Loading Quick Start Guide
title:OSDU® Data Platform Data Loading Quick Start Guide
---
---
# OSDU® Data Platform - Data Loading Quick Start Guide
# OSDU® Data Platform - Data Loading Quick Start Guide
## Contents
## Contents
...
@@ -21,8 +20,9 @@ These are the important links/documents that, you should first try to read and b
...
@@ -21,8 +20,9 @@ These are the important links/documents that, you should first try to read and b
3.[OSDU Core Services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview) - This contains the latest core services API and documentation that the OSDU® platform supports.
3.[OSDU Core Services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview) - This contains the latest core services API and documentation that the OSDU® platform supports.
| Airflow | Airflow is the designated workflow engine for OSDU®. Airflow is used to schedule and orchestrate the different workflows in OSDU® for data flow. Best practices can be found [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/wikis/Ingestion-DAG-Best-Practices). |
| Airflow | Airflow is the designated workflow engine for OSDU®. Airflow is used to schedule and orchestrate the different workflows in OSDU® for data flow. Best practices can be found [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/wikis/Ingestion-DAG-Best-Practices). |
| Manifest | A manifest is a container specifically designed for facilitating metadata into the OSDU® platform. As of this writing, a Manifest has structures to support holding metadata records of the following types: Reference Data Master Data Work Product Work Product Component Datasets |
| Manifest | A manifest is a container specifically designed for facilitating metadata into the OSDU® platform. As of this writing, a Manifest has structures to support holding metadata records of the following types: Reference Data Master Data Work Product Work Product Component Datasets |
| Source Data | Source Data might be an Excel file, LAS/DLIS files, Seismic data, text files, data streams, databases, etc. One of the goals of OSDU® is to support storing source data in its original format to preserve lineage. Metadata is created to allow this source data to remain in its original format yet remain searchable and discoverable within the platform. |
| Source Data | Source Data might be an Excel file, LAS/DLIS files, Seismic data, text files, data streams, databases, etc. One of the goals of OSDU® is to support storing source data in its original format to preserve lineage. Metadata is created to allow this source data to remain in its original format yet remain searchable and discoverable within the platform. |
...
@@ -40,12 +40,12 @@ This document addresses end-to-end data loading from the perspective of the end-
...
@@ -40,12 +40,12 @@ This document addresses end-to-end data loading from the perspective of the end-
The Data Flow workstream covers the full end-to-end process of facilitating data into the OSDU® platform. The Data Ingestion workstream is part of the Data Loading workstream.
The Data Flow workstream covers the full end-to-end process of facilitating data into the OSDU® platform. The Data Ingestion workstream is part of the Data Loading workstream.
* Loading – this workstream captures all the work necessary to ready the data for ingestion. Activities might include:
- Loading – this workstream captures all the work necessary to ready the data for ingestion. Activities might include:
* Data fetching and organization
- Data fetching and organization
* Data massaging and formatting
- Data massaging and formatting
* Manifest/Metadata creation
- Manifest/Metadata creation
* Loading source data to a landing zone or staging area
- Loading source data to a landing zone or staging area
* Ingestion – this workstream facilitates data into the OSDU® Data Platform. There are several data ingestion services available.
- Ingestion – this workstream facilitates data into the OSDU® Data Platform. There are several data ingestion services available.
### Data Platform Overview (Simplified)
### Data Platform Overview (Simplified)
...
@@ -60,22 +60,22 @@ Below is a simplified overview of the data platform. There are more interactions
...
@@ -60,22 +60,22 @@ Below is a simplified overview of the data platform. There are more interactions
These are the main OSDU® data types:
These are the main OSDU® data types:
1.[Reference Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data) - These are the standard naming for the data values. For example, the reference value for measured depth is MD and for elevation is ELEV. Whenever these values are being used, the reference data must be first loaded in the OSDU® platform. There are 3 governance levels for the reference data:
1.[Reference Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data) - These are the standard naming for the data values. For example, the reference value for measured depth is MD and for elevation is ELEV. Whenever these values are being used, the reference data must be first loaded in the OSDU® platform. There are 3 governance levels for the reference data:
*[Fixed](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/FIXED) - Pre-determined by agreement in OSDU® Forum and shall not be changed. This allows interoperability between companies.
-[Fixed](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/FIXED) - Pre-determined by agreement in OSDU® Forum and shall not be changed. This allows interoperability between companies.
*[Open](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/OPEN) - Agreed by OSDU® Forum but companies may extend with custom values. Custom values shall not conflict with Forum values. This allows some level of interoperability between companies.
-[Open](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/OPEN) - Agreed by OSDU® Forum but companies may extend with custom values. Custom values shall not conflict with Forum values. This allows some level of interoperability between companies.
*[Local](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/LOCAL) - OSDU® Forum makes no declaration about the values and companies need to create their own list. This list does not benefit much from interoperability and agreed-upon values are hard to come by.
-[Local](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data/LOCAL) - OSDU® Forum makes no declaration about the values and companies need to create their own list. This list does not benefit much from interoperability and agreed-upon values are hard to come by.
2.[Master Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/master-data) - A record of the information about business objects that we manage in the OSDU® record catalog. For example, a list of field names with well names and their associated wellbore names.
2.[Master Data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/master-data) - A record of the information about business objects that we manage in the OSDU® record catalog. For example, a list of field names with well names and their associated wellbore names.
3.[Work Product](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product) - A record that ties together a set of work product components such as a group of well logs inside a wellbore.
3.[Work Product](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product) - A record that ties together a set of work product components such as a group of well logs inside a wellbore.
4.[Work Product Components](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product-component) - A record that describes the business content of a single well log, such as the log data information, top, bottom depth of the well log.
4.[Work Product Components](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/work-product-component) - A record that describes the business content of a single well log, such as the log data information, top, bottom depth of the well log.
* Here is the [list](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/E-R/work-product-component#supported-bulk-standards) of the supported bulk standards in OSDU®.
- Here is the [list](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/E-R/work-product-component#supported-bulk-standards) of the supported bulk standards in OSDU®.
5.[File](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/dataset) - A record that describes the metadata about the digital files, but does not describe the business content of the file, such as the file size, checksum of a well log.
5.[File](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/Generated/dataset) - A record that describes the metadata about the digital files, but does not describe the business content of the file, such as the file size, checksum of a well log.
### Getting the data into the OSDU® Data Platform
### Getting the data into the OSDU® Data Platform
Once you are familiar with the OSDU® data types, you must understand that there are several [data flow services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview#data-flow-services-and-dags) to bring the data into the OSDU® Data Platform. There are pros & cons in each approach as detailed in each section link below.
Once you are familiar with the OSDU® data types, you must understand that there are several [data flow services](https://community.opengroup.org/groups/osdu/platform/-/wikis/Core-Services-Overview#data-flow-services-and-dags) to bring the data into the OSDU® Data Platform. There are pros & cons in each approach as detailed in each section link below.
*[CSV Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser) – Schlumberger has developed an ingestion workflow capable of parsing a CSV file into a schema and loading each entry as a record into OSDU®. There is future work to enrich the flattened schema structure created by the CSV parser into an R3-style schema.
-[CSV Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser) – Schlumberger has developed an ingestion workflow capable of parsing a CSV file into a schema and loading each entry as a record into OSDU®. There is future work to enrich the flattened schema structure created by the CSV parser into an R3-style schema.
*[Manifest-based Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags) – The manifest ingestion workflow leverages a manifest schema definition defined by the Data Definitions team to facilitate data into the OSDU® Data Platform.
-[Manifest-based Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags) – The manifest ingestion workflow leverages a manifest schema definition defined by the Data Definitions team to facilitate data into the OSDU® Data Platform.
*[WITSML Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser) – Energistics have created an ingestion workflow capable of parsing WITSML into R3 schema formats.
-[WITSML Parser Ingestion](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser) – Energistics have created an ingestion workflow capable of parsing WITSML into R3 schema formats.
The fundamental idea in each of these ingestion methods is to trigger [storage service API](https://community.opengroup.org/osdu/platform/system/storage) to create the records. Alternatively, one can also run the storage service API directly to create the records but note that this approach is very forgiving and could lead to unexpected behavior.
The fundamental idea in each of these ingestion methods is to trigger [storage service API](https://community.opengroup.org/osdu/platform/system/storage) to create the records. Alternatively, one can also run the storage service API directly to create the records but note that this approach is very forgiving and could lead to unexpected behavior.
...
@@ -92,20 +92,25 @@ This guide assumes you have access to a working OSDU® environment, please conta
...
@@ -92,20 +92,25 @@ This guide assumes you have access to a working OSDU® environment, please conta
In this quickstart guide, we will use the [open-test-data](https://community.opengroup.org/osdu/data/open-test-data) to demonstrate the steps above. In this example, we describe one of the three methods described above - Manifest-based Ingestion.
In this quickstart guide, we will use the [open-test-data](https://community.opengroup.org/osdu/data/open-test-data) to demonstrate the steps above. In this example, we describe one of the three methods described above - Manifest-based Ingestion.
***Manifest-based Ingestion**
-**Manifest-based Ingestion**
1. Load reference data in the OSDU® Data Platform
1. Load reference data in the OSDU® Data Platform
* For TNO example, the reference data [manifests](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/reference-data) should first be loaded into the OSDU® platform.
* Any other missing reference data can be found in the OSDU® community [reference data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data). This repository is maintained by the Data Definitions team.
- For TNO example, the reference data [manifests](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/reference-data) should first be loaded into the OSDU® platform.
- Any other missing reference data can be found in the OSDU® community [reference data](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/ReferenceValues/Manifests/reference-data). This repository is maintained by the Data Definitions team.
2. Prepare the master/WPC data manifests JSON
2. Prepare the master/WPC data manifests JSON
* Here is a set of [Python data preparation scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) to help with the manifest generation.
* You can either [learn](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-generate-manifests-using-scripts) to generate them from scratch with the [scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) or use to ones that have been [generated](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances) from the scripts.
- Here is a set of [Python data preparation scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) to help with the manifest generation.
- You can either [learn](https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-generate-manifests-using-scripts) to generate them from scratch with the [scripts](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/2-scripts) or use to ones that have been [generated](https://community.opengroup.org/osdu/data/open-test-data/-/tree/master/rc--3.0.0/4-instances) from the scripts.
3. Load master/WP/WPC data in the OSDU® Data Platform
3. Load master/WP/WPC data in the OSDU® Data Platform
* Send a POST request to `{OSDU_BASE_URL}/api/workflow/v1/workflow/Osdu_ingest/workflowRun` with the manifest JSON in the request body to trigger the workflow ingestion service as shown in an example below:
- Send a POST request to `{OSDU_BASE_URL}/api/workflow/v1/workflow/Osdu_ingest/workflowRun` with the manifest JSON in the request body to trigger the workflow ingestion service as shown in an example below:
```json
```json
{"executionContext":{
{
"executionContext":{
"Payload":{
"Payload":{
"AppKey":"test-app",
"AppKey":"test-app",
"data-partition-id":"{{data-partition-id}}"
"data-partition-id":"{{data-partition-id}}"
...
@@ -117,20 +122,12 @@ In this quickstart guide, we will use the [open-test-data](https://community.ope
...
@@ -117,20 +122,12 @@ In this quickstart guide, we will use the [open-test-data](https://community.ope
@@ -225,19 +217,19 @@ Here’s an example of a batch mode input format for the workflow run API (simpl
...
@@ -225,19 +217,19 @@ Here’s an example of a batch mode input format for the workflow run API (simpl
This section runs through the common tasks in data loading and ingestions. Refer to the links in each section to dive deeper.
This section runs through the common tasks in data loading and ingestions. Refer to the links in each section to dive deeper.
1.[How to generate manifests using scripts](https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-generate-manifests-using-scripts) - by Yanbin Zhang [Chevron]
1.[How to generate manifests using scripts](https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-generate-manifests-using-scripts) - by Yanbin Zhang [Chevron]
2.[How to load a LAS data (Manifest-based Ingestion, Metadata only, without Wellbore DDMS](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-LAS-data-(Manifest-based-Ingestion)) - by Ivar Sørheim [Equinor]
2.[How to load a LAS data (Manifest-based Ingestion, Metadata only, without Wellbore DDMS](<https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-load-a-LAS-data-(Manifest-based-Ingestion)>) - by Ivar Sørheim [Equinor]
3.[How to perform a CSV Ingestion with Dataset service](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/CSV-Ingestion) - by Chad Leong [SLB]
3.[How to perform a CSV Ingestion with Dataset service](https://community.opengroup.org/groups/osdu/platform/-/wikis/CSV-Ingestion) - by Chad Leong [SLB]
3.[How to perform a CSV Ingestion with File service](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/CSV-Ingestion-File-Service) - by Samiullah Ghousudeen [BP]
4.[How to perform a CSV Ingestion with File service](https://community.opengroup.org/groups/osdu/platform/-/wikis/CSV-Ingestion-File-Service) - by Samiullah Ghousudeen [BP]
4.[How to set up sdutil for uploading seismic data in Windows](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/Step-by-Step-of-Setting-up-Python-Environment-for-sdutil-in-Windows) - by Chad Leong [SLB]
5.[How to set up sdutil for uploading seismic data in Windows](https://community.opengroup.org/groups/osdu/platform/-/wikis/Step-by-Step-of-Setting-up-Python-Environment-for-sdutil-in-Windows) - by Chad Leong [SLB]
5.[How to load a seismic data via Seismic DDMS - SEGY to oZGY (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/testing.md) - by Andras Szalai [EPAM]
6.[How to load a seismic data via Seismic DDMS - SEGY to oZGY (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/testing.md) - by Andras Szalai [EPAM]
6.[How to load a seismic data via Seismic DDMS - SEGY to oZGY](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
7.[How to load a seismic data via Seismic DDMS - SEGY to oZGY](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/blob/master/doc/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
7.[How to load a seismic data via Seismic DDMS - SEGY to oVDS (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/blob/master/docs/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
8.[How to load a seismic data via Seismic DDMS - SEGY to oVDS (Storage service API)](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/blob/master/docs/gcp/QUICKSTART.md) - by Yan Sushchynski [EPAM]
8.[How to load a WITSML data (WITSML Parser)](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-WITSML-data-(WITSML-Parser)) - by Kateryna Kurach [EPAM]
9.[How to load a WITSML data (WITSML Parser)](<https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-load-a-WITSML-data-(WITSML-Parser)>) - by Kateryna Kurach [EPAM]
9.[How to load a generic file in OSDU® using File/Dataset service](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-load-a-generic-file-in-OSDU-using-File-or-Dataset-Service) - by Kateryna Kurach [EPAM]
10.[How to load a generic file in OSDU® using File/Dataset service](https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-load-a-generic-file-in-OSDU-using-File-or-Dataset-Service) - by Kateryna Kurach [EPAM]
9.[How to check for error in Airflow Dag](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/How-to-check-for-error-in-Airflow-DAG) - by Chad Leong [SLB]
11.[How to check for error in Airflow Dag](https://community.opengroup.org/groups/osdu/platform/-/wikis/How-to-check-for-error-in-Airflow-DAG) - by Chad Leong [SLB]
10.[How to search for ingested record](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md)
12.[How to search for ingested record](https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/docs/tutorial/SearchService.md)
11.[Troubleshooting Index Status of Data Ingested](https://community.opengroup.org/groups/osdu/platform/data-flow/data-loading/-/wikis/Troubleshooting-Index-Status-of-Data-Ingested) - by Samiullah Ghousudeen [BP]
13.[Troubleshooting Index Status of Data Ingested](https://community.opengroup.org/groups/osdu/platform/-/wikis/Troubleshooting-Index-Status-of-Data-Ingested) - by Samiullah Ghousudeen [BP]
12.[Wellbore DDMS Data Loader Utility Quickstart guide](https://community.opengroup.org/osdu/ui/data-loading/wellbore-ddms-data-loader/-/wikis/Wellbore-DDMS-Data-Loader-Utility-Quickstart-Guide) - by Samiullah Ghosudeen [BP]
14.[Wellbore DDMS Data Loader Utility Quickstart guide](https://community.opengroup.org/osdu/ui/data-loading/wellbore-ddms-data-loader/-/wikis/Wellbore-DDMS-Data-Loader-Utility-Quickstart-Guide) - by Samiullah Ghosudeen [BP]
## Bulk loading
## Bulk loading
...
@@ -256,35 +248,35 @@ Here are some [worked examples](https://community.opengroup.org/osdu/data/data-d
...
@@ -256,35 +248,35 @@ Here are some [worked examples](https://community.opengroup.org/osdu/data/data-d