Commit a16d4f0c authored by Oleksandr Kosse (EPAM)'s avatar Oleksandr Kosse (EPAM)
Browse files

Add ingestion strategy scripts

- Add scripts for ingestion service
- Update Readme
parent c73670fc
......@@ -67,6 +67,7 @@ iam-policy-patch(){
gcloud projects add-iam-policy-binding ${GCLOUD_PROJECT} --member serviceAccount:$service_account_name --role $2 1> /dev/null
}
create-service-account "osdu-gcp-sa"
create-service-account "runtime-config-account"
create-service-account "entitlements-bucket-access"
create-service-account "users-group-admin"
......@@ -86,6 +87,7 @@ iam-policy-patch "entitlements-bucket-access" "roles/storage.objectViewer"
iam-policy-patch "dataflow" "roles/storage.objectAdmin"
iam-policy-patch "dataflow" "roles/storage.admin"
iam-policy-patch "dataflow" "roles/dataflow.worker"
iam-policy-patch "osdu-gcp-sa" "roles/storage.admin"
GOOGLE_PROJECT_NUMBER=`gcloud projects list | grep -E '(^| )'${GCLOUD_PROJECT}'( |$)' | awk -F ' ' '{print $3}'`
# Add Deployment Manager Service Account to Create and Manage Users/Service Account Permissions
......@@ -107,4 +109,4 @@ gcloud iam service-accounts add-iam-policy-binding sli-core-tester@${GCLOUD_PROJ
gcloud iam service-accounts add-iam-policy-binding ingestion-consumption-sli@${GCLOUD_PROJECT}.iam.gserviceaccount.com --member=serviceAccount:${GCLOUD_PROJECT}@appspot.gserviceaccount.com --role=roles/iam.serviceAccountTokenCreator --project ${GCLOUD_PROJECT} 1> /dev/null
# not using deployment manager for this because you need to add project creator role for the build agent on folder level, because service account DM template uses cloudresourcemanager.v1.project
#service-accounts
\ No newline at end of file
#service-accounts
......@@ -13,7 +13,6 @@
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
......@@ -91,34 +90,34 @@
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
......@@ -186,7 +185,9 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2020 Open Subsurface Data Universe Software / Platform / Deployment and Operations
Copyright 2020 Google LLC
Copyright 2017-2019, Schlumberger
Copyright 2020 EPAM
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
......
......@@ -14,11 +14,12 @@ From security perspective, it is recommended to have a jump host with appropriat
### Prerequisites
Before running the automated scripts, perform the following steps manually:
1. Configure [GSuite](https://gsuite.google.com/) with a primary domain name
2. Enable billing
3. Create a GCP project
4. Install [Google Cloud SDK](https://cloud.google.com/sdk/) (version 280 or above), [Golang](https://golang.org/), `jq`, `python3-dev` and `python3-venv`
5. ECE/ESS Cluster Deployment for Elasticsearch. Currently this is a manual procedure, but it is automated internally with the indexer service
1. Define your primary domain name. You need to have enough permissions for adding new records to verify domain ownership and create a subdomain for OSDU
2. Configure [Cloud Identity](https://cloud.google.com/identity/docs/overview) on top of it. Another option is using [GSuite](https://gsuite.google.com/) from your organization, but it is not possible to use free/trial GSuite due to the limit of accounts numbers.
3. Create a [GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects)
4. Enable [billing](https://cloud.google.com/billing/docs)
5. Install [Google Cloud SDK](https://cloud.google.com/sdk/) (version 280 or above), [Golang](https://golang.org/), `jq`, `python3-dev`, and `python3-venv`
6. ECE/ESS Cluster Deployment for Elasticsearch. Currently this is a manual procedure, but it is automated internally with the indexer service
For data indexing and searching, we will use Elasticsearch. We can use either the Elastic Cloud Enterprise (ECE - preffered) solution or the Hosted Elastic Search (ESS) solution. This is an independent resource creation which needs separate billing.
......@@ -224,6 +225,13 @@ cd datalake-groups-init && ./create-default-groups.sh && cd -
```shell
./datafier-permissions.sh
```
23. Create gcp.dastore rows for ingestion-strategy (refer docs inside directory):
```shell
cd ingestion-strategy
python ingest-strategy.py --namespace=opendes-namespace --dagname=Osdu_ingest --datatype=osdu --workflowtype=osdu
python ingest-strategy.py --namespace=opendes-namespace --dagname=Well_log_ingest --datatype=well_log --workflowtype=ingest
python ingest-strategy.py --namespace=opendes-namespace --dagname=Default_ingest --workflowtype=ingest
```
---
## Environment Variables
......
# Create gcp.datastore with kind ingestion-strategy
## About
Ingestion service in datastore mode requires gcp.datastore table ingest-datastore.
From which it could determine which DAG it should to run.
### Usage
Provide authentication credentials to your application code by setting the environment variable `GOOGLE_APPLICATION_CREDENTIALS`. Replace [PATH] with the file path of the JSON file that contains your service account key.
Be aware that option `--namespace` is project specific, and should be changed accordingly.
At least three rows should be added:
```
$ python ingest-strategy.py --namespace=opendes-namespace --dagname=Osdu_ingest --datatype=osdu --workflowtype=osdu
$ python ingest-strategy.py --namespace=opendes-namespace --dagname=Well_log_ingest --datatype=well_log --workflowtype=ingest
$ python ingest-strategy.py --namespace=opendes-namespace --dagname=Default_ingest --workflowtype=ingest
```
# Imports the Google Cloud client library
from google.cloud import datastore
import argparse
# Create variables
parser = argparse.ArgumentParser()
parser.add_argument('--namespace', help='Your cloud project namespace.')
parser.add_argument('--dagname', help='Your DagName field.', default=None, type=str)
parser.add_argument('--datatype', help='Your DataType field.', default=None, type=str)
parser.add_argument('--userid', help='Your UserID field.', default=None, type=str)
parser.add_argument('--workflowtype', help='Your WorkflowType field.', default=None, type=str)
args = parser.parse_args()
# Instantiates a client
namespace = args.namespace
datastore_client = datastore.Client(namespace=namespace)
# The kind for the new entity
kind = 'ingest-strategy'
# The name/ID for the new entity could be added as datastore_client.key(kind, name)
# In our case datastore creates autogenerated id
# The Cloud Datastore key for the new entity
task_key = datastore_client.key(kind)
# Prepares the new entity
# If we create field - task['DAGName'] = Default_dag ;
# This will be presented in datastore as DAGName - dURlZmF1bHRfZGFn , basically as blob variable;
# Correct syntax should be task['DAGName'] = u'Default_dag' , which means convert to unicode string.
# if statement on whether we need to change variable to unicode-string or keep it None type
if args.dagname is None:
DAGName = args.dagname
else:
DAGName = unicode(args.dagname, "utf-8")
if args.datatype is None:
DataType = args.datatype
else:
DataType = unicode(args.datatype, "utf-8")
if args.userid is None:
UserID = args.userid
else:
UserID = unicode(args.userid, "utf-8")
if args.workflowtype is None:
WorkflowType = args.workflowtype
else:
WorkflowType = unicode(args.workflowtype, "utf-8")
# Initialize entity, and create fields
task = datastore.Entity(key=task_key)
task['DAGName'] = DAGName
task['DataType'] = DataType
task['UserID']= UserID
task['WorkflowType'] = WorkflowType
# Saves the entity to datastore
datastore_client.put(task)
Hot-to use:
If datatype for variable should be null, just omit passing this variable to script, like in example below with --userid
$ python strategy.py --namespace=opendes --dagname=Ingest_dag --datatype=osdu --workflowtype=ingest-str
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment