README.md 17.3 KB
Newer Older
1
# OSDU R3 Ingestion Service
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
2
3
4
5
6
7
8

## Contents

* [Introduction](#introduction)
* [System interactions](#system-interactions)
    * [Default ingestion workflow](#default-ingestion-workflow)
    * [OSDU ingestion workflow](#osdu-ingestion-workflow)
9
    * [OSDU Ingestion workflow (v2 API)](#osdu-ingestion-workflow-v2-api)
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
10
11
12
* [Ingestion API](#ingestion-api)
    * [POST /submit](#post-submit)
    * [POST /submitWithManifest](#post-submitwithmanifest)
13
    * [POST /v2/submit/manifest](#post-v2submitmanifest)
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
14
15
16
17
18
* [GCP implementation](#gcp-implementation)
* [Firestore](#firestore-collections)

## Introduction

19
20
The OSDU R3 Ingestion service starts ingestion of OSDU documents, such as the OSDU Work Products,
Work Product Components, and Files. The Ingestion service is basically a wrapper around the [OSDU R3
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
21
22
23
24
25
Workflow service] and performs preliminary work before starting actual ingestion. The preliminary
work can include fetching file location data or validating the manifest.

## System interactions

26
The Ingestion service in the OSDU R3 Prototype provides two ingestion workflows.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
27
28
29
30
31

The _Default (Opaque) Ingestion_ workflow is designed to ingest files without metadata. Per request,
only one file is ingested.

The _OSDU (Manifest) Ingestion_ workflow is designed to ingest multiple files with metadata
32
33
34
35
associated with them. The metadata is passed as an OSDU manifest which may differ depending on the 
version of the API. In the first version of the api it must contain an OSDU Work
Product and associated Work Product Components. In the second version of the API it must contain
also Kind, Reference Data and Master Data.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49

### Default Ingestion workflow

The Default Ingestion workflow is designed to ingest one file per request. Before submitting a file
for ingestion, the user needs to upload the file to the system. For that purpose, the user needs to
obtain a signed URL from the OSDU R2 Delivery service, and then upload their file by the URL. By the
URL, the user will be able to upload their file.

For more information on uploading files to the system, consult the [OSDU R2 Delivery service
documentation].

The Default Ingestion workflow starts upon a call to the `/submit` endpoint. The following diagram
shows this workflow.

50
![OSDU R3 Ingestion Service submit](img/77780782-357ee700-705d-11ea-8388-a1671d06ee22.png)
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
51
52
53
54
55
56
57

Upon a `/submit` request:

1. Validate the incoming request.
    * Verify the authentication token. If the token is missing or invalid, respond with the `401
    Unauthorized` status.
    * Verify the partition ID. If the partition ID is missing, invalid or doesn't have assigned user
58
    groups, respond with the `400 Bad Request` status.    
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
59
60
    * Verify `DataType`. Respond with the `400 Bad request` status if the `DataType` is an empty
    string or consists of whitespaces.
61
62
63
64
65
    > `DataType` can contain any string. DataType value is pass as a request parameter.
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
   variable "workflow_name" which value is taken from the variable "DataType". Pass the file 
   location in the context.
3. Receive the workflow run ID from the Workflow service, and then return the run ID to the user or app that
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
66
67
68
69
70
71
72
73
74
started ingestion.

### OSDU Ingestion workflow

The OSDU Ingestion workflow is designed to ingest well log .las files with the manifest.

The OSDU Ingestion workflow starts upon a call to the Ingestion service's `/submitWithManifest`
endpoint. The following diagram shows the workflow.

75
![OSDU R3 Ingestion Service submitWithManifest](img/77780782-357ee700-705d-11ea-8388-a1671d06ee22.png)
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
76
77
78
79
80
81
82
83
84
85
86

The workflow is the following:

1. Validate the incoming request.
    * Verify the authentication token. If the token is missing or invalid, respond with the `401
    Unauthorized` status.
    * Verify the partition ID. If the partition ID is missing or invalid or doesn't have assigned
    user groups, respond with the `400 Bad Request` status.
    * Validate the manifest. If the manifest doesn't correspond to the OSDU
    `WorkProductLoadManifestStagedFiles` schema stored in the project's database, fail ingestion,
    and then respond with an HTTP error.
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
   variable "workflow_name" which value `manifest_ingestion` and the manifest added in the 
   request's `Context` property. 
3. Return the workflow run ID received from the Workflow service.

### OSDU Ingestion workflow (v2 API)

This version of the API also uses the manifest to upload files like the first version. This version 
of the api differs from the first version the structure of the manifest and another endpoint path.

The workflow is the following:

1. Validate the incoming request.
    * Verify the authentication token. If the token is missing or invalid, respond with the `401
      Unauthorized` status.
    * Verify the partition ID. If the partition ID is missing or invalid or doesn't have assigned
      user groups, respond with the `400 Bad Request` status.    
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
   variable "workflow_name" which value is stored in properties. "workflow_name" value 
   in this case is `Osdu_ingest`. The manifest added in the request's `Context` property.
3. Return the workflow run ID received from the Workflow service.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
108
109
110

## Ingestion API

111
The Ingest service's API includes the following endpoints in the OSDU R3 Prototype:
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
112
113
114

* `/submit`, external
* `/submitWithManifest`, external
115
* `/v2/submit/manifest`, external
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
116
117
118
119
120
121
122
123
124
125
126
127

General considerations related to querying the Ingestion API:

* Each endpoint must receive the authentication bearer token in the "Authorization" header. Example:
`"Authorization": "Bearer {token}"`
* Each endpoint must receive the partition ID in the "Partition-ID" header. Example:
`"Partition-Id: "assigned_partition"`
* The request and response Content Type is always "application/json"

### POST /submit

The `/submit` API endpoint starts a new ingestion process and carries out necessary operations
128
depending on the file type. The current implementation of the endpoint supports ingestion of any file types.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
129
130
131
132
133
134

#### Incoming request body

| Property   | Type     | Description                                                 |
| ---------- | -------- | ----------------------------------------------------------- |
| `FileID`   | `String` | Unique ID of the file                                       |
135
136
| `DataType` | `String` | Type of file. Supported data types: any |
| `Context` | `Array` | Key, value structure. Contain other useful information |
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
137

138
> **Note**: `DataType` cannot contain only whitespaces.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
139
140
141
142
143
144
145
146
147
148
149

**Example**:

```sh
curl --location --request POST 'https://{path}/submit' \
    --header 'Authorization: Bearer {token}' \
    --header 'Partition-Id: {assigned DELFI partition ID}' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "FileID": "c26c7656-8c50-4147-b51f-c7a449af33f3",
        "DataType": "opaque"
150
151
152
153
        "Context": {
          "key1": "value 1",
          "key2": "value 2"
  }
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
154
155
156
157
158
159
160
    }'
```

#### Response body

| Property     | Type     | Description                                                        |
| ------------ | -------- | ------------------------------------------------------------------ |
161
| `WorkflowRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
162
163
164
165
166
167
168

### POST /submitWithManifest

The `/submitWithManifest` API endpoint starts the OSDU ingestion process for the OSDU Work Product,
Work Product Components, and Files passed in the OSDU `WorkProductLoadManifestStagedFiles` manifest.

Differently from the `/submit` endpoint, the request body for `/submitWithManifest` doesn't need to
169
contain a `FileID`, `DataType` and `Context`.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
170
171

The list of file IDs must be added to the manifest's `Files` property. The `DataType` property
172
defaults to "manifest_ingestion" for all files.
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305

#### Incoming request body

| Property                | Type     | Description                                                                                                                        |
| ----------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `WorkProduct`           | `Object` | OSDU Work Product with **ResourceTypeID**, **ResourceSecurityClassification**, **Data**, and **ComponentsAssociativeID** properties                |
| `WorkProductComponents` | `Array`  | List of OSDU Work Product Components. Each WPC contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, **FileAssociativeIDs**, and **Data** properties |
| `Files`                 | `Array`  | List of OSDU Files. Each File contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, and **Data** properties |

Request example:

```sh
curl -X POST \
  https://{Apigee URI}/submit \
  -H 'Authorization: Bearer {token}' \
  -H 'Partition-Id: {assigned DELFI partition ID}' \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
  "WorkProduct": {
    "ResourceTypeID": "srn:type:work-product/WellLog:",
    "ResourceSecurityClassification": "srn:reference-data/ResourceSecurityClassification:RESTRICTED:",
    "Data": {
      "GroupTypeProperties": {
        "Components": []
      },
      "IndividualTypeProperties": {
        "Name": "AKM-11 LOG",
        "Description": "Well Log"
      },
      "ExtensionProperties": {}
    },
    "ComponentsAssociativeIDs": [
      "wpc-1"
    ]
  },
  "WorkProductComponents": [
    {
      "ResourceTypeID": "srn:type:work-product-component/WellLog:",
      "ResourceSecurityClassification": "srn:reference-data/ResourceSecurityClassification:RESTRICTED:",
      "Data": {
        "GroupTypeProperties": {
          "Files": [],
          "Artefacts": []
        },
        "IndividualTypeProperties": {
          "Name": "AKM-11 LOG",
          "Description": "Well Log",
          "WellboreID": "srn:master-data/Wellbore:1013:",
          "TopMeasuredDepth": {
            "Depth": 2182.0004,
            "UnitOfMeasure": "srn:reference-data/UnitOfMeasure:M:"
          },
          "BottomMeasuredDepth": {
            "Depth": 2481.0,
            "UnitOfMeasure": "srn:reference-data/UnitOfMeasure:M:"
          },
          "Curves": [
            {
              "Mnemonic": "DEPT",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:M:"
            },
            {
              "Mnemonic": "GR",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:GAPI:"
            },
            {
              "Mnemonic": "DT",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:US/F:"
            },
            {
              "Mnemonic": "RHOB",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:G/C3:"
            },
            {
              "Mnemonic": "DRHO",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:G/C3:"
            },
            {
              "Mnemonic": "NPHI",
              "TopDepth": 2182.0,
              "BaseDepth": 2481.0,
              "DepthUnit": "srn:reference-data/UnitOfMeasure:M:",
              "CurveUnit": "srn:reference-data/UnitOfMeasure:V/V:"
            }
          ]
        },
        "ExtensionProperties": {}
      },
      "AssociativeID": "wpc-1",
      "FileAssociativeIDs": [
        "f-1"
      ]
    }
  ],
  "Files": [
    {
      "ResourceTypeID": "srn:type:file/las2:",
      "ResourceSecurityClassification": "srn:reference-data/ResourceSecurityClassification:RESTRICTED:",
      "Data": {
        "GroupTypeProperties": {
          "FileSource": "",
          "PreLoadFilePath": "{Path to File}"
        },
        "IndividualTypeProperties": {},
        "ExtensionProperties": {}
      },
      "AssociativeID": "f-1"
    }
  ]
}
'
```

#### Response body

| Property     | Type     | Description                                                       |
| ------------ | -------- | ------------------------------------------------------------------ |
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
| `WorkfloweRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |

## Validation

The Ingestion service's current implementation performs a general check of the validity of the
incoming authentication token and partition ID. Also, the service checks if the `FileID` property is
provided. For OSDU Ingestion workflow, the service also validates the manifest.

In OSDU R3 Prototype, the service doesn't perform any verification whether a file upload happened.

### POST /v2/submit/manifest

The `/v2/submit/manifest` API endpoint starts the OSDU ingestion process for the OSDU Reference Data,
MasterData, Kind and Data.

Differently from the `/submit` endpoint, the request body for `/v2/submit/manifest` doesn't need to
contain a `FileID`, `DataType` and `Context`.

The list of file IDs must be added to the manifest's `Files` property in `Data`. The `DataType` property
defaults for all files and stored in properties.

#### Incoming request body

| Property                | Type     | Description                                                                                                                        |
| ----------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `WorkProduct`           | `Object` | OSDU Work Product with **ResourceTypeID**, **ResourceSecurityClassification**, **Data**, and **ComponentsAssociativeID** properties                |
| `WorkProductComponents` | `Array`  | List of OSDU Work Product Components. Each WPC contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, **FileAssociativeIDs**, and **Data** properties |
| `Files`                 | `Array`  | List of OSDU Files. Each File contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, and **Data** properties |
| `kind`                  | `String` | Manifest kind. |
| `ReferenceData`         | `Array`  | Reference-data are submitted as an array of records. |
| `MasterData`            | `Array`  | Master-data is submitted as an array of records. |
| `Data`                  | `Array`  | Manifest schema for work-product, work-product-component, file ensembles. The items in 'Files' are processed first since they are referenced by 'WorkProductComponents' ('data.Files[]' and 'data.Artefacts[].ResourceID'). The WorkProduct is processed last collecting the WorkProductComponents. |

Request example:

```sh
curl -X POST \
  https://{Apigee URI}/submit \
  -H 'Authorization: Bearer {token}' \
  -H 'Partition-Id: {assigned DELFI partition ID}' \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
  "kind": "osdu:wks:Manifest:1.0.0",
  "ReferenceData": [
    {}
  ],
  "MasterData": [
    {}
  ],
  "Data": {
    "WorkProduct": {},
    "WorkProductComponents": [
      {}
    ],
    "Files": [
      {}
    ]
  }
}'
```

#### Response body

| Property     | Type     | Description                                                       |
| ------------ | -------- | ------------------------------------------------------------------ |
| `WorkfloweRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
373
374
375
376
377
378
379

## Validation

The Ingestion service's current implementation performs a general check of the validity of the
incoming authentication token and partition ID. Also, the service checks if the `FileID` property is
provided. For OSDU Ingestion workflow, the service also validates the manifest.

380
381
In OSDU R3 Prototype, the service doesn't perform any verification whether a file upload happened.

ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
382
383
384
385
386
387
388
389
390
391
392
393
394

## GCP implementation

For development purposes, it's recommended to create a separate GCP Identity and Access Management
service account. It's enough to grant the **Service Account Token Creator** role to the development
service account.

Obtaining user credentials for Application Default Credentials isn't suitable in this case because
signing a blob is only available with the service account credentials. Remember to set the
`GOOGLE_APPLICATION_CREDENTIALS` environment variable. Follow the [instructions on the Google
developer's portal][application-default-credentials].

The GCP implementation contains two mutually exclusive modules to work with the persistence layer.
395
Presently, OSDU R3 connects to legacy Cloud Datastore for compatibility with the current OpenDES
ethiraj krishnamanaidu's avatar
ethiraj krishnamanaidu committed
396
397
398
implementation. In the future OSDU releases, Cloud Datastore will be replaced by a Cloud Firestore
implementation that's already available in the project.

Riabokon Stanislav(EPAM)[GCP]'s avatar
Riabokon Stanislav(EPAM)[GCP] committed
399
400
* Documentation for the GCP Cloud Datastore implementation is located in [here](./provider/ingest-gcp-datastore/README.md)
* Documentation for the GCP Cloud Firestore implementation is located in [here](./provider/ingest-gcp/README.md)