Commit 638f0a76 authored by Riabokon Stanislav(EPAM)[GCP]'s avatar Riabokon Stanislav(EPAM)[GCP] Committed by Dmitriy Rudko
Browse files

Updated Ingestion Service to support R3 Schemas and a new Workflow service API

- GONRG-1301 - Finalize refactoring of Ingestion Service
- GONRG-1744 - Added R3 Schema support to ingestion service as a partof ADR !30
- GONRG-1548: Implement audit events for Ingestion Service
- GONRG-1551 - Added API specification for Ingestion Service
- GONRG-1551 - Update documentation for Ingestion Service
- GONRG-1744 - fix integration tests for submitWithManifest
parent ecd02a67
# OSDU R2 Ingestion Service
# OSDU R3 Ingestion Service
## Contents
......@@ -6,29 +6,33 @@
* [System interactions](#system-interactions)
* [Default ingestion workflow](#default-ingestion-workflow)
* [OSDU ingestion workflow](#osdu-ingestion-workflow)
* [OSDU Ingestion workflow (v2 API)](#osdu-ingestion-workflow-v2-api)
* [Ingestion API](#ingestion-api)
* [POST /submit](#post-submit)
* [POST /submitWithManifest](#post-submitwithmanifest)
* [POST /v2/submit/manifest](#post-v2submitmanifest)
* [GCP implementation](#gcp-implementation)
* [Firestore](#firestore-collections)
## Introduction
The OSDU R2 Ingestion service starts ingestion of OSDU documents, such as the OSDU Work Products,
Work Product Components, and Files. The Ingestion service is basically a wrapper around the [OSDU R2
The OSDU R3 Ingestion service starts ingestion of OSDU documents, such as the OSDU Work Products,
Work Product Components, and Files. The Ingestion service is basically a wrapper around the [OSDU R3
Workflow service] and performs preliminary work before starting actual ingestion. The preliminary
work can include fetching file location data or validating the manifest.
## System interactions
The Ingestion service in the OSDU R2 Prototype provides two ingestion workflows.
The Ingestion service in the OSDU R3 Prototype provides two ingestion workflows.
The _Default (Opaque) Ingestion_ workflow is designed to ingest files without metadata. Per request,
only one file is ingested.
The _OSDU (Manifest) Ingestion_ workflow is designed to ingest multiple files with metadata
associated with them. The metadata is passed as an OSDU manifest, which must contain an OSDU Work
Product and associated Work Product Components.
associated with them. The metadata is passed as an OSDU manifest which may differ depending on the
version of the API. In the first version of the api it must contain an OSDU Work
Product and associated Work Product Components. In the second version of the API it must contain
also Kind, Reference Data and Master Data.
### Default Ingestion workflow
......@@ -43,7 +47,7 @@ documentation].
The Default Ingestion workflow starts upon a call to the `/submit` endpoint. The following diagram
shows this workflow.
![OSDU R2 Ingestion Service submit](img/77780782-357ee700-705d-11ea-8388-a1671d06ee22.png)
![OSDU R3 Ingestion Service submit](img/77780782-357ee700-705d-11ea-8388-a1671d06ee22.png)
Upon a `/submit` request:
......@@ -51,23 +55,14 @@ Upon a `/submit` request:
* Verify the authentication token. If the token is missing or invalid, respond with the `401
Unauthorized` status.
* Verify the partition ID. If the partition ID is missing, invalid or doesn't have assigned user
groups, respond with the `400 Bad Request` status.
* Verify `FileID`. Respond with the `400 Bad request` status and the `Missing required field
FileID` message if a `FileID` isn't provided.
groups, respond with the `400 Bad Request` status.
* Verify `DataType`. Respond with the `400 Bad request` status if the `DataType` is an empty
string or consists of whitespaces.
> `DataType` can contain any string. If the string is not "well_log", then the data type is
> treated as "opaque". During the next steps in the ingestion flow, the Opaque Ingestion DAG
> will run for any `DataType` but "well_log".
2. Query the Delivery service's `/getFileLocation` API endpoint to obtain a direct link to the file
by `FileID`. The Delivery service will verify whether the `FileID` field exists in the database and
will fetch the file location data. The following flows are possible for the Delivery service:
* Respond with the `400 Bad request` status and the `Missing required field FileID` message if
an ID wasn't provided.
* Respond with the `Driver` and `Location` for the requested `FileID`.
3. Query the Workflow service's `/startWorkflow` API endpoint with the workflow type "ingest". Pass
the file location in the context.
4. Receive the workflow ID from the Workflow service, and then return the ID to the user or app that
> `DataType` can contain any string. DataType value is pass as a request parameter.
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
variable "workflow_name" which value is taken from the variable "DataType". Pass the file
location in the context.
3. Receive the workflow run ID from the Workflow service, and then return the run ID to the user or app that
started ingestion.
### OSDU Ingestion workflow
......@@ -77,7 +72,7 @@ The OSDU Ingestion workflow is designed to ingest well log .las files with the m
The OSDU Ingestion workflow starts upon a call to the Ingestion service's `/submitWithManifest`
endpoint. The following diagram shows the workflow.
![OSDU R2 Ingestion Service submitWithManifest](img/77781014-84c51780-705d-11ea-8846-ea08163afcf7.png)
![OSDU R3 Ingestion Service submitWithManifest](img/77780782-357ee700-705d-11ea-8388-a1671d06ee22.png)
The workflow is the following:
......@@ -89,16 +84,35 @@ The workflow is the following:
* Validate the manifest. If the manifest doesn't correspond to the OSDU
`WorkProductLoadManifestStagedFiles` schema stored in the project's database, fail ingestion,
and then respond with an HTTP error.
2. Query the Workflow service's `/startWorkflow` API endpoint with the "osdu" workflow type and the
manifest added in the request's `Context` property.
3. Return the workflow ID received from the Workflow service.
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
variable "workflow_name" which value `manifest_ingestion` and the manifest added in the
request's `Context` property.
3. Return the workflow run ID received from the Workflow service.
### OSDU Ingestion workflow (v2 API)
This version of the API also uses the manifest to upload files like the first version. This version
of the api differs from the first version the structure of the manifest and another endpoint path.
The workflow is the following:
1. Validate the incoming request.
* Verify the authentication token. If the token is missing or invalid, respond with the `401
Unauthorized` status.
* Verify the partition ID. If the partition ID is missing or invalid or doesn't have assigned
user groups, respond with the `400 Bad Request` status.
2. Query the Workflow service's `/v1/workflow/{workflow_name}/workflowRun` API endpoint with path
variable "workflow_name" which value is stored in properties. "workflow_name" value
in this case is `Osdu_ingest`. The manifest added in the request's `Context` property.
3. Return the workflow run ID received from the Workflow service.
## Ingestion API
The Ingest service's API includes the following endpoints in the OSDU R2 Prototype:
The Ingest service's API includes the following endpoints in the OSDU R3 Prototype:
* `/submit`, external
* `/submitWithManifest`, external
* `/v2/submit/manifest`, external
General considerations related to querying the Ingestion API:
......@@ -111,18 +125,17 @@ General considerations related to querying the Ingestion API:
### POST /submit
The `/submit` API endpoint starts a new ingestion process and carries out necessary operations
depending on the file type. The operations include obtaining file location data from the OSDU R2
Delivery service. The current implementation of the endpoint supports ingestion of any file types.
depending on the file type. The current implementation of the endpoint supports ingestion of any file types.
#### Incoming request body
| Property | Type | Description |
| ---------- | -------- | ----------------------------------------------------------- |
| `FileID` | `String` | Unique ID of the file |
| `DataType` | `String` | Type of file. Supported data types: "well_log" and "opaque" |
| `DataType` | `String` | Type of file. Supported data types: any |
| `Context` | `Array` | Key, value structure. Contain other useful information |
> **Note**: `DataType` can be any string. If the `DataType` value is not "well_log", then it's
> treated as the "opaque" data type. `DataType` cannot contain only whitespaces.
> **Note**: `DataType` cannot contain only whitespaces.
**Example**:
......@@ -134,6 +147,10 @@ curl --location --request POST 'https://{path}/submit' \
--data-raw '{
"FileID": "c26c7656-8c50-4147-b51f-c7a449af33f3",
"DataType": "opaque"
"Context": {
"key1": "value 1",
"key2": "value 2"
}
}'
```
......@@ -141,13 +158,7 @@ curl --location --request POST 'https://{path}/submit' \
| Property | Type | Description |
| ------------ | -------- | ------------------------------------------------------------------ |
| `WorkflowID` | `String` | Unique ID of the workflow that was started by the Workflow service |
#### Internal requests
During the `/submit` workflow, the Ingestion service queries the Delivery service's
`/getFileLocation` API endpoint. The information retrieved from the Delivery API will be added to
the request body's Context and passed to the Workflow service.
| `WorkflowRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |
### POST /submitWithManifest
......@@ -155,10 +166,10 @@ The `/submitWithManifest` API endpoint starts the OSDU ingestion process for the
Work Product Components, and Files passed in the OSDU `WorkProductLoadManifestStagedFiles` manifest.
Differently from the `/submit` endpoint, the request body for `/submitWithManifest` doesn't need to
contain a `FileID` and `DataType`.
contain a `FileID`, `DataType` and `Context`.
The list of file IDs must be added to the manifest's `Files` property. The `DataType` property
defaults to "well_log" for all files.
defaults to "manifest_ingestion" for all files.
#### Incoming request body
......@@ -292,7 +303,73 @@ curl -X POST \
| Property | Type | Description |
| ------------ | -------- | ------------------------------------------------------------------ |
| `WorkflowID` | `String` | Unique ID of the workflow that was started by the Workflow service |
| `WorkfloweRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |
## Validation
The Ingestion service's current implementation performs a general check of the validity of the
incoming authentication token and partition ID. Also, the service checks if the `FileID` property is
provided. For OSDU Ingestion workflow, the service also validates the manifest.
In OSDU R3 Prototype, the service doesn't perform any verification whether a file upload happened.
### POST /v2/submit/manifest
The `/v2/submit/manifest` API endpoint starts the OSDU ingestion process for the OSDU Reference Data,
MasterData, Kind and Data.
Differently from the `/submit` endpoint, the request body for `/v2/submit/manifest` doesn't need to
contain a `FileID`, `DataType` and `Context`.
The list of file IDs must be added to the manifest's `Files` property in `Data`. The `DataType` property
defaults for all files and stored in properties.
#### Incoming request body
| Property | Type | Description |
| ----------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `WorkProduct` | `Object` | OSDU Work Product with **ResourceTypeID**, **ResourceSecurityClassification**, **Data**, and **ComponentsAssociativeID** properties |
| `WorkProductComponents` | `Array` | List of OSDU Work Product Components. Each WPC contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, **FileAssociativeIDs**, and **Data** properties |
| `Files` | `Array` | List of OSDU Files. Each File contains at least **ResourceTypeID**, **ResourceSecurityClassification**, **AssociativeID**, and **Data** properties |
| `kind` | `String` | Manifest kind. |
| `ReferenceData` | `Array` | Reference-data are submitted as an array of records. |
| `MasterData` | `Array` | Master-data is submitted as an array of records. |
| `Data` | `Array` | Manifest schema for work-product, work-product-component, file ensembles. The items in 'Files' are processed first since they are referenced by 'WorkProductComponents' ('data.Files[]' and 'data.Artefacts[].ResourceID'). The WorkProduct is processed last collecting the WorkProductComponents. |
Request example:
```sh
curl -X POST \
https://{Apigee URI}/submit \
-H 'Authorization: Bearer {token}' \
-H 'Partition-Id: {assigned DELFI partition ID}' \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{
"kind": "osdu:wks:Manifest:1.0.0",
"ReferenceData": [
{}
],
"MasterData": [
{}
],
"Data": {
"WorkProduct": {},
"WorkProductComponents": [
{}
],
"Files": [
{}
]
}
}'
```
#### Response body
| Property | Type | Description |
| ------------ | -------- | ------------------------------------------------------------------ |
| `WorkfloweRunID` | `String` | Unique ID of the workflow run that was started by the Workflow service |
## Validation
......@@ -300,7 +377,8 @@ The Ingestion service's current implementation performs a general check of the v
incoming authentication token and partition ID. Also, the service checks if the `FileID` property is
provided. For OSDU Ingestion workflow, the service also validates the manifest.
In OSDU R2 Prototype, the service doesn't perform any verification whether a file upload happened.
In OSDU R3 Prototype, the service doesn't perform any verification whether a file upload happened.
## GCP implementation
......@@ -314,7 +392,7 @@ signing a blob is only available with the service account credentials. Remember
developer's portal][application-default-credentials].
The GCP implementation contains two mutually exclusive modules to work with the persistence layer.
Presently, OSDU R2 connects to legacy Cloud Datastore for compatibility with the current OpenDES
Presently, OSDU R3 connects to legacy Cloud Datastore for compatibility with the current OpenDES
implementation. In the future OSDU releases, Cloud Datastore will be replaced by a Cloud Firestore
implementation that's already available in the project.
......
openapi: 3.0.0
info:
description: |
Ingestion Service provides high-level API on top of Workflow service that serves as an entry point for OSDU specific Workflows
version: 1.0.0
title: Ingestion Service API
contact:
email: SupportGO3-NRG@epam.com
license:
name: Apache 2.0
url: 'http://www.apache.org/licenses/LICENSE-2.0.html'
security:
- bearer: []
servers:
- url: 'os-ingest-attcrcktoa-uc.a.run.app'
paths:
/v1/submit:
post:
tags:
- Submit
summary: Submit parser-based ingestion workflow
operationId: submit
description: This API is used to trigger a new Workflow Run for a parser-based ingestion Workflow.
responses:
'200':
description: Workflow submitted successfully
content:
application/json:
schema:
$ref: '#/components/schemas/submitResponse'
'400':
description: Bad Request
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'401':
description: Invalid/Expired Credential
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'404':
description: Not Found
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'500':
description: Internal Server Error
content:
application/json:
schema:
$ref: '#/components/schemas/error'
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/parserRequest'
description: Request payload for pareser ingestion workflow
/v1/submitWithManifest:
post:
tags:
- Submit
summary: Submit manifest ingestion workflow
operationId: submitManifest
description: This API is used to trigger a new Workflow Run for a manifest ingestion Workflow.
responses:
'200':
description: Workflow submitted successfully
content:
application/json:
schema:
$ref: '#/components/schemas/submitResponse'
'400':
description: Bad Request
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'401':
description: Invalid/Expired Credential
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'404':
description: Not Found
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'500':
description: Internal Server Error
content:
application/json:
schema:
$ref: '#/components/schemas/error'
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/manifestRequest'
description: Request payload for manifest ingestion workflow
/v2/submit/manifest:
post:
tags:
- Submit V2
summary: Submit manifest ingestion workflow
operationId: submitManifestV2
description: This API is used to trigger a new Workflow Run for a manifest ingestion Workflow.
responses:
'200':
description: Workflow submitted successfully
content:
application/json:
schema:
$ref: '#/components/schemas/submitResponse'
'400':
description: Bad Request
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'401':
description: Invalid/Expired Credential
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'404':
description: Not Found
content:
application/json:
schema:
$ref: '#/components/schemas/error'
'500':
description: Internal Server Error
content:
application/json:
schema:
$ref: '#/components/schemas/error'
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/manifestRequestV2'
description: Request payload for manifest ingestion workflow
components:
securitySchemes:
bearer:
type: apiKey
name: Authorization
in: header
schemas:
submitResponse:
title: Submit Request Response
type: object
properties:
WorkflowRunID:
type: string
description: ID of triggered Workflow Run
parserRequest:
title: Parser Ingestion Request
type: object
properties:
FileID:
type: string
example: osdu:file:eea57de21ce242ca9e0860148cf5522f
description: File ID to process in workflow run associated with DataType
DataType:
type: string
description: Name of the workflow registered in Workflow service
Context:
type: object
description: Execution Context that should be passed to Workflow Run
properties:
key1:
type: string
example: value1
key2:
type: string
example: value
manifestRequest:
title: Manifest Ingestion Request
type: object
properties:
Files:
type: array
items:
type: object
description: OSDU File Record
WorkProductComponents:
type: array
items:
type: object
description: OSDU WPC Record
WorkProduct:
type: object
description: OSDU WP Record
manifestRequestV2:
title: Manifest Ingestion Request
type: object
properties:
kind:
type: string
description: Reflected in Manifest R3 schema. Obsolete parameter.
example: osdu:wks:Manifest:1.0.0
ReferenceData:
type: array
items:
type: object
description: Reference-data are submitted as an array of records.
MasterData:
type: array
items:
type: object
description: Master-data is submitted as an array of records.
Data:
type: object
description: Manifest schema for work-product, work-product-component, file ensembles. The items in 'Files' are processed first since they are referenced by 'WorkProductComponents' ('data.Files[]' and 'data.Artefacts[].ResourceID'). The WorkProduct is processed last collecting the WorkProductComponents.
properties:
WorkProduct:
type: object
description: The work-product component capturing the work-product-component records belonging to this loading/ingestion transaction.
WorkProductComponents:
description: The list of work-product-components records. The record ids are internal surrogate keys enabling the association of work-product-component records with the work-product records.
type: array
items:
type: object
Files:
description: The list of 'Files' or data containers holding the actual data. The record ids are usually internal surrogate keys enabling the association of file records with work-product-component records, namely 'Files' and 'Artefacts' (both referring to file group-type entity types).
type: array
items:
type: object
error:
title: Error
type: object
properties:
code:
type: integer
format: int32
errors:
type: array
items:
$ref: '#/components/schemas/errorDetails'
message:
type: string
errorDetails:
title: Error Details
type: object
properties:
message:
type: string
reason:
type: string
/*
Copyright 2021 Google LLC
Copyright 2021 EPAM Systems, Inc
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package org.opengroup.osdu.ingest.api;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.opengroup.osdu.core.common.model.http.DpsHeaders;
import org.opengroup.osdu.core.common.model.storage.StorageRole;
import org.opengroup.osdu.ingest.model.SubmitResponse;
import org.opengroup.osdu.ingest.model.v3.SubmitManifestRequest;
import org.opengroup.osdu.ingest.provider.interfaces.v3.ISubmitManifestService;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.context.annotation.RequestScope;
@Slf4j
@RequestScope
@Validated
@RequiredArgsConstructor
@RestController
public class SubmitManifestApi {
private final DpsHeaders headers;
private final ISubmitManifestService manifestService;
@PostMapping("/v2/submit/manifest")
@PreAuthorize("@authorizationFilter.hasPermission('" + StorageRole.CREATOR + "')")
public SubmitResponse submit(@RequestBody SubmitManifestRequest manifest) {
log.info("Submit manifest received : {}", manifest);
SubmitResponse submitResponse = this.manifestService.submit(manifest, this.headers);
log.info("Submit manifest result ready : {}", submitResponse);
return submitResponse;
}
}