|
|
|
|
|
## Storage Service
|
|
|
|
|
|
## Table of Contents <a name="TOC"></a>
|
|
|
- [Introduction](#Introduction)
|
|
|
- [Record structure](#Record-structure)
|
|
|
- [Schema structure](#Schema)
|
|
|
- [Ingestion workflow](#Ingestion-workflow)
|
|
|
* [Becoming a Data Ecosystem user](#Becoming-a-Data-Ecosystem-user)
|
|
|
* [Choosing a partition](#Choosing-a-partition)
|
|
|
* [Creating data groups](#Creating-data-groups)
|
|
|
* [Creating the schema](#Creating-the-schema)
|
|
|
* [Creating the legal tag](#Creating-the-legal-tag)
|
|
|
* [Creating records](#Creating-records)
|
|
|
* [Ingesting records](#Ingesting-records)
|
|
|
- [Storage service APIs](#Storage-APIs)
|
|
|
* [Schemas](#schemas)
|
|
|
+ [Create Schema](#Create-schema)
|
|
|
+ [Get Schema](#Get-schema)
|
|
|
* [Query](#query)
|
|
|
+ [Query all kinds](#Query-kinds)
|
|
|
+ [Fetch records](#Fetch-records)
|
|
|
* [Records](#record)
|
|
|
+ [Create or Update records](#Creating-records)
|
|
|
+ [Get record version](#Retrieve-specific-version)
|
|
|
+ [Get all record versions](#Retrieve-all-record-versions)
|
|
|
+ [Get record](#Retrieve-latest-record-version)
|
|
|
+ [Delete record](#Delete-record)
|
|
|
- [Using service accounts to access Storage APIs](#Service-accounts)
|
|
|
- [Using skipdupes](#skipdupes)
|
|
|
|
|
|
## Introduction <a name="Introduction"></a>
|
|
|
After performing the basic user management procedures (create users and groups, assign users to groups, etc.) through [Entitlements Service](/solutions/dataecosystem/tutorials/entitlementsservice), DELFI developer can use the Data Ecosystem Storage Service to ingest metadata information generated by DELFI applications into the Data Ecosystem. The Storage Service provides a set of APIs to manage the entire metadata life-cycle such as ingestion (persistence), modification, deletion, versioning and data schema.
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
## Record structure <a name="Record-structure"></a>
|
|
|
From the Storage Service perspective, the metadata to be ingested is called __record__. Below is a basic example of a Data Ecosystem record with a brief explanation of each field:
|
|
|
|
|
|
```
|
|
|
{
|
|
|
"id": "common:hello:123456",
|
|
|
"kind": "common:test:hello:1.0.0",
|
|
|
"acl": {
|
|
|
"viewers": ["data.default.viewers@common.delfi.slb.com"],
|
|
|
"owners": ["data.default.owners@common.delfi.slb.com"]
|
|
|
},
|
|
|
"legal": {
|
|
|
"legaltags": ["common-sample-legaltag"],
|
|
|
"otherRelevantDataCountries": ["FR","US","CA"]
|
|
|
},
|
|
|
"data": {
|
|
|
"msg": "Hello World, Data Ecosystem!"
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
* __id__: _(optional)_ Unique identifier in the Data Ecosystem. When not provided, the service will create and assign an id to the record. Must follow the naming convention: ``{Slb-Data-Partition-Id}:{object-type}:{uuid}``.
|
|
|
* __kind__: _(mandatory)_ Kind of data being ingested. Must follow the naming convention: ``{Slb-Data-Partition-Id}:{dataset-name}:{record-type}:{version}``.
|
|
|
* __acl__: _(mandatory)_ Group of users who have access to the record.
|
|
|
* __acl.viewers__: List of valid groups which will have view/read privileges over the record. We follow the naming convention such that data groups begin with ``data.``.
|
|
|
* __acl.owners__: List of valid groups which will have write privileges over the record. We follow the naming convention such that data groups begin with ``data.``.
|
|
|
* __legal__: _(mandatory)_ Attributes which represent the legal constraints associated with the record.
|
|
|
* __legal.legaltags__: List of legal tag names associated with the record.
|
|
|
* __legal.otherRelevantDataCountries__: List of other relevant data countries. Must have at least 2 values: where the data was ingested from and where Data Ecosystem stores the data.
|
|
|
* __data__: _(mandatory)_ Record payload represented as a list of key-value pairs.
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
|
|
|
## Schema structure <a name="Schema"></a>
|
|
|
Another important concept in the Data Ecosystem Storage Service is __schema__. Schema is a structure, also defined in JSON, which provides data type information for the record fields. In other words, the schema defines whether a given field in the record is a ``string``, or an ``integer``, or a ``float``, or a ``geopoint``, etc.
|
|
|
|
|
|
> It is important to note that __only__ fields with schema information associated with are indexed by the [Search Service](/solutions/dataecosystem/tutorials/searchservice). For this reason, the DELFI developer __must__ create the respective schema for his/her records kind __before__ start ingesting records into the Data Ecosystem.
|
|
|
|
|
|
Schemas and records are tied together by the __kind__ attribute. On top of that, a given __kind__ can have zero or exactly one schema associated with. Having that concept in mind, the DELFI developer can make use of two APIs for schema management provided by the Data Ecosystem Storage Service:
|
|
|
|
|
|
```
|
|
|
POST /api/storage/v2/schemas
|
|
|
GET /api/storage/v2/schemas/{kind}
|
|
|
```
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
|
|
|
## Ingestion workflow <a name="Ingestion-workflow"></a>
|
|
|
|
|
|
For sake of demonstration of the schema and records concepts as well as their respective APIs, lets consider the following use case:
|
|
|
|
|
|
> The DELFI developer wants to ingest metadata information related to his/her well dataset. The metadata contains the following pieces of information: name of the well, company name, year when it was drilled, total depth, and the well location.
|
|
|
|
|
|
In summary, to execute the above workflow, the DELFI developer needs to perform the following tasks:
|
|
|
|
|
|
1. Be a valid Data Ecosystem user;
|
|
|
2. Define which partition to use;
|
|
|
3. Create and/or assign users to a existing partition data group;
|
|
|
4. Agree on the __kind__ attribute which will represent the developer's wells. Let's assume it to be ``common:welldb:wellbore:1.0.0``;
|
|
|
5. Create the __legal tag__ that represents the legal constraints for the metadata to be ingested;
|
|
|
6. Create a schema for the kind ``common:welldb:wellbore:1.0.0`` via the ``POST /api/storage/v2/schemas`` API;
|
|
|
7. Create and ingest records via the ``PUT /api/storage/v2/records`` API.
|
|
|
|
|
|
|
|
|
### Becoming a Data Ecosystem user <a name="Becoming-a-Data-Ecosystem-user"></a>
|
|
|
Please refer to [Entitlements Service](/solutions/dataecosystem/tutorials/entitlementsservice) to learn how to become a valid Data Ecosystem user.
|
|
|
|
|
|
|
|
|
### Choosing a partition <a name="Choosing-a-partition"></a>
|
|
|
The Data Ecosystem stores data in different tenants depending on the different accounts in the DELFI system. A user may belong to many accounts in DELFI e.g. a SLB user may belong to both the SLB account and a customer's account. When a user logs into the DELFI portal, one chooses which account to be active.
|
|
|
When using the Storage Service APIs, specify the active account as the ``Slb-Data-Partition-Id``. The correct values can be obtained from CFS services. In our P4D environment you can choose between ``slb``, ``customer`` and ``common``;
|
|
|
|
|
|
|
|
|
### Creating data groups <a name="Creating-data-groups"></a>
|
|
|
Please refer to [Entitlements Service](/solutions/dataecosystem/tutorials/entitlementsservice) to learn how to create data groups (the ones which starts with ``data.``) and assign users to them. For data access authorization purposes in this example, let's assume the groups ``data.default.viewers@common.delfi.slb.com`` and ``data.default.owners@common.delfi.slb.com`` were previously created via [Entitlements Service](/solutions/dataecosystem/tutorials/entitlementsservice).
|
|
|
|
|
|
|
|
|
### Creating the schema <a name="Creating-the-schema"></a>
|
|
|
The schema creation is done via the ``POST /api/storage/v2/schemas`` API. For the sample workflow in question, the schema could be created as follows:
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request POST \
|
|
|
--url '/api/storage/v2/schemas' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common' \
|
|
|
--data '{
|
|
|
"kind": "common:welldb:wellbore:1.0.0",
|
|
|
"schema": [
|
|
|
{
|
|
|
"path": "name",
|
|
|
"kind": "string"
|
|
|
},
|
|
|
{
|
|
|
"path": "company",
|
|
|
"kind": "string"
|
|
|
},
|
|
|
{
|
|
|
"path": "drillingYear",
|
|
|
"kind": "int"
|
|
|
},
|
|
|
{
|
|
|
"path": "depth",
|
|
|
"kind": "float"
|
|
|
},
|
|
|
{
|
|
|
"path": "location",
|
|
|
"kind": "core:dl:geopoint:1.0.0"
|
|
|
}
|
|
|
]
|
|
|
}'
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
The schema is basically composed by a list of path/kinds pairs where the record fields are related to their data type. For more information about the supported schema data types, please refer to the [Schema API documentation](/solutions/dataecosystem/apis/p4d-data-ecosystem-storage-service).
|
|
|
|
|
|
### Creating the legal tag <a name="Creating-the-legal-tag"></a>
|
|
|
Please refer to [Compliance Service](/solutions/dataecosystem/tutorials/complianceservice) for legal tag creation. For this example, let's assume a legal tag called ``delfi-well-legal`` is created already.
|
|
|
|
|
|
|
|
|
### Creating records <a name="Creating-records"></a>
|
|
|
After the legal tag creation and schema definition, the records of the kind ``common:welldb:wellbore:1.0.0`` can be created. They need to follow the same structure and fields' naming convention as defined in the schema. A sample record would be something as follows:
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
{
|
|
|
"kind": "common:welldb:wellbore:1.0.0",
|
|
|
"acl": {
|
|
|
"viewers": ['data.default.viewers@common.delfi.slb.com'],
|
|
|
"owners": ['data.default.owners@common.delfi.slb.com']
|
|
|
},
|
|
|
"legal": {
|
|
|
"legaltags": ['common-sample-legaltag'],
|
|
|
"otherRelevantDataCountries": ["FR","US","CA"]
|
|
|
},
|
|
|
"data": {
|
|
|
"name": "well1",
|
|
|
"company": "slb",
|
|
|
"drillingYear": 1983,
|
|
|
"depth": 1208.84,
|
|
|
"location": {
|
|
|
"latitude": 29.7512026,
|
|
|
"longitude": -95.4812934
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
|
|
|
### Ingesting records <a name="Ingesting-records"></a>
|
|
|
Having the record structure defined, the DELFI developer must use the ``PUT /api/storage/v2/records'`` API to ingest his/her records, as follows:
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request PUT \
|
|
|
--url '/api/storage/v2/records' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common' \
|
|
|
--data '[
|
|
|
{
|
|
|
"kind": "common:welldb:wellbore:1.0.0",
|
|
|
"acl": {
|
|
|
"viewers": ['data.default.viewers@common.delfi.slb.com'],
|
|
|
"owners": ['data.default.owners@common.delfi.slb.com']
|
|
|
},
|
|
|
"legal": {
|
|
|
"legaltags": ['common-sample-legaltag'],
|
|
|
"otherRelevantDataCountries": ["FR","US","CA"]
|
|
|
},
|
|
|
"data": {
|
|
|
"name": "well1",
|
|
|
"company": "slb",
|
|
|
"drillingYear": 1983,
|
|
|
"depth": 1208.84,
|
|
|
"location": {
|
|
|
"latitude": 29.7512026,
|
|
|
"longitude": -95.4812934
|
|
|
}
|
|
|
},
|
|
|
{
|
|
|
"kind": "common:welldb:wellbore:1.0.0",
|
|
|
"acl": {
|
|
|
"viewers": ['data.default.viewers@common.delfi.slb.com'],
|
|
|
"owners": ['data.default.owners@common.delfi.slb.com']
|
|
|
},
|
|
|
"legal": {
|
|
|
"legaltags": ['common-sample-legaltag'],
|
|
|
"otherRelevantDataCountries": ["IN","BR","CA"]
|
|
|
},
|
|
|
"data": {
|
|
|
"name": "well12312",
|
|
|
"company": "shell",
|
|
|
"drillingYear": 2001,
|
|
|
"depth": 208.84,
|
|
|
"location": {
|
|
|
"latitude": 49.7512026,
|
|
|
"longitude": -65.4812934
|
|
|
}
|
|
|
}
|
|
|
},
|
|
|
...]'
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
## Storage service APIs <a name="Storage-APIs"></a>
|
|
|
The Data Ecosystem Storage service has three different categories of API's 1.Schemas 2.Records 3.Query for schema and record management.
|
|
|
|
|
|
## Schemas <a name="schemas"></a>
|
|
|
### Create Schema <a name="Create-schema"></a>
|
|
|
Schema creation is explained at [Creating the schema](#Creating-the-schema) section.
|
|
|
|
|
|
### Get Schema <a name="Get-schema"></a>
|
|
|
The schema for a given 'kind' can be retrieved using the Get Schema API.
|
|
|
```
|
|
|
GET /api/storage/v2/schemas/{kind}
|
|
|
```
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
|
|
|
curl --request GET \
|
|
|
--url '/api/storage/v2/schemas/{kind}' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common'
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
## Query <a name="query"></a>
|
|
|
|
|
|
### Query all kinds <a name="Query-kinds"></a>
|
|
|
The API returns a list of all kinds in the specific {Slb-Data-Partition-Id}.
|
|
|
```
|
|
|
GET /api/storage/v2/query/kinds
|
|
|
```
|
|
|
|
|
|
#### Parameters <a name="parameters"></a>
|
|
|
|
|
|
| Parameter | Description |
|
|
|
| :--- | :--- |
|
|
|
| limit | The maximum number of results to return from the given offset. If no limit is provided, then it will return __10__ items. Max number of items which can be fetched by the query is __100__.|
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request GET \
|
|
|
--url '/api/storage/v2/query/kinds' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common'
|
|
|
--data '{
|
|
|
"limit": 10,
|
|
|
}
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
### Fetch Records <a name="Fetch-records"></a>
|
|
|
The API fetches multiple records(maximum 20) from storage service at once, it allows user to request data being converted to common standard by using customized header {slb-frame-of-reference}. Common standard is units in SI, crs in wgs84, elevation in msl, azimuth in true north, dates in utc.
|
|
|
Currently only "none" and "units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;" are valid values for the header {slb-frame-of-reference}.
|
|
|
|
|
|
|
|
|
As for now, we only support conversion for units and crs. Dates, elevation and azimuth will be available later. Returned records could be either original value or converted(units=SI;crs=wgs84) value depending on users' requests and conversion status, original value will be returned when users not request the conversion or the conversion is requested but failed.
|
|
|
In addition to records user requests, if conversion is requested, a list of conversion status of each record would be included in the response, indicating whether the conversion was successful or not, it not, what were the errors happened.
|
|
|
|
|
|
|
|
|
```
|
|
|
POST /api/storage/v2/query/records:batch
|
|
|
```
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request POST \
|
|
|
--url '/api/storage/v2/query/records:batch' \
|
|
|
--header 'Authorization: Bearer <JWT>' \
|
|
|
--header 'Content-Type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common' \
|
|
|
--header 'slb-frame-of-reference: units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;' \
|
|
|
--data '{
|
|
|
"records": [
|
|
|
"common:well:123456789",
|
|
|
"common:wellTop:abc789456",
|
|
|
"common:wellLog:4531wega22"
|
|
|
]
|
|
|
}
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
## Records <a name="record"></a>
|
|
|
### Create or Update records <a name="Creating-records"></a>
|
|
|
The API represents the main injection mechanism into the Data Ecosystem. It allows records creation and/or update. When no record id is provided or when the provided id is not already present in the Data Ecosystemthen a new record is created. If the id is related to an existing record in the Data Ecosystemthen an update operation takes place and a new version of the record is created.
|
|
|
More details available at [Creating records](#Creating-records) and [Ingesting records](#Ingesting-records) sections.
|
|
|
|
|
|
### Get record version <a name="Retrieve-specific-version"></a>
|
|
|
The API retrieves the specific version of the given record.
|
|
|
```
|
|
|
GET /api/storage/v2/records/{id}/{version}
|
|
|
|
|
|
```
|
|
|
#### Parameters <a name="parameters"></a>
|
|
|
|
|
|
| Parameter | Description |
|
|
|
| :--- | :--- |
|
|
|
| attribute | Filter attributes to restrict the returned fields of the record. Usage: data.{record-data-field-name}.|
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
|
|
|
curl --request GET \
|
|
|
--url '/api/storage/v2/records/{id}/{version}' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common'
|
|
|
--data '{
|
|
|
"attributes": [
|
|
|
"data.msg"
|
|
|
]
|
|
|
}
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
### Get all record versions <a name="Retrieve-all-record-versions"></a>
|
|
|
The API returns a list containing all versions for the given record id.
|
|
|
```
|
|
|
GET /api/storage/v2/records/versions/{id}
|
|
|
|
|
|
```
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request GET \
|
|
|
--url '/api/storage/v2/records/versions/{id}'\
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common'
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
|
|
|
### Get record <a name="Retrieve-latest-record-version"></a>
|
|
|
This API returns the latest version of the given record.
|
|
|
```
|
|
|
GET /api/storage/v2/records/{id}
|
|
|
```
|
|
|
|
|
|
#### Parameters <a name="parameters"></a>
|
|
|
|
|
|
| Parameter | Description |
|
|
|
| :--- | :--- |
|
|
|
| attribute | Filter attributes to restrict the returned fields of the record. Usage: data.{record-data-field-name}.|
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request GET \
|
|
|
--url '/api/storage/v2/records/{id}'\
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json' \
|
|
|
--header 'Slb-Data-Partition-Id: common' \
|
|
|
--data '{
|
|
|
"attributes": [
|
|
|
"data.msg"
|
|
|
]
|
|
|
}
|
|
|
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
|
|
|
### Delete record <a name="Delete-record"></a>
|
|
|
The API performs a logical deletion of the given record. This operation can be reverted later.
|
|
|
```
|
|
|
POST /api/storage/v2/records/{id}:delete
|
|
|
```
|
|
|
|
|
|
<details><summary>curl</summary>
|
|
|
|
|
|
```
|
|
|
curl --request POST \
|
|
|
--url '/api/storage/v2/records/{id}:delete' \
|
|
|
--header 'accept: application/json' \
|
|
|
--header 'authorization: Bearer <JWT>' \
|
|
|
--header 'content-type: application/json'\
|
|
|
--header 'Slb-Data-Partition-Id: common'
|
|
|
```
|
|
|
</details>
|
|
|
|
|
|
|
|
|
## Using service accounts to access Storage APIs <a name="Service-accounts"></a>
|
|
|
The Storage service relies on the Google native data access authorization mechanisms to provide access control on the records.
|
|
|
Based on design decisions, when the Storage service caller is a federated user, no additional configuration is necessary, however if the API caller is a service account, a mandatory configuration is necessary as follows:
|
|
|
|
|
|
- Navigate to the GCP project which the caller service account belongs to;
|
|
|
- Go to IAM & admin > service accounts;
|
|
|
- Select the caller service account;
|
|
|
- In the right-hand side Permissions panel, click at "Add member" button;
|
|
|
- In the member text box add the following email ``{DATA_ECOSYSTEM_PROJECT}@appspot.gserviceaccount.com``. For instance, in P4D enviroment the member email is ``p4d-ddl-eu-services@appspot.gserviceaccount.com``;
|
|
|
- Select the role ``Service Accounts`` > ``Service Account Token Creator``.
|
|
|
|
|
|
[Back to table of contents](#TOC)
|
|
|
|
|
|
## Using skipdupes <a name="skipdupes"></a>
|
|
|
The skipdupes param is only related to update operations, which means you are calling the API with record IDs already present into the Data Ecosystem. If skipdupes==true, it means the service will not update the record if the payload is the same (duplicates).
|
|
|
If there is a difference in the payload, then a new version of the record will be created. On the other hand, skipdupes == false, in an update operation, the service will not check whether the payload is the same or not and will always create a new version, even if identical to a previous version. On the response side, skipedRecordIds are the record IDs which weren't updated (skipped) due skipdupes == true and same payload.
|
|
|
In PUT response, there will be no more replication in the record IDs, they will be in either recordIds or skippedRecordIds.
|
|
|
|
|
|
[Back to table of contents](#TOC) |
|
|
\ No newline at end of file |