Commit 466d0b09 authored by Oleksandr Kosse (EPAM)'s avatar Oleksandr Kosse (EPAM)
Browse files

September update repo

parent 86ae4777
Pipeline #10414 passed with stage
in 35 seconds
......@@ -88,6 +88,9 @@ iam-policy-patch "dataflow" "roles/storage.objectAdmin"
iam-policy-patch "dataflow" "roles/storage.admin"
iam-policy-patch "dataflow" "roles/dataflow.worker"
iam-policy-patch "osdu-gcp-sa" "roles/storage.admin"
iam-policy-patch "osdu-gcp-sa" "roles/composer.user"
iam-policy-patch "osdu-gcp-sa" "roles/datastore.user"
iam-policy-patch "osdu-gcp-sa" "roles/iam.serviceAccountTokenCreator"
GOOGLE_PROJECT_NUMBER=`gcloud projects list | grep -E '(^| )'${GCLOUD_PROJECT}'( |$)' | awk -F ' ' '{print $3}'`
# Add Deployment Manager Service Account to Create and Manage Users/Service Account Permissions
......
......@@ -80,8 +80,8 @@ copy-keys-to-storage_bucket(){
}
export temp_dir=temp
export config_name=${ENVIRONMENT}-keys
export config_name=${ENVIRONMENT_NAME}-keys
generate-service-account-key
get-keys-from-runtime-config
copy-keys-to-storage_bucket
\ No newline at end of file
copy-keys-to-storage_bucket
# Creating Release Definition
# Creating Release Definition #
## Introduction ##
## Introduction
This document contains instructions to create a specific release definition in ADO to configure a Google Cloud (GCP) project for OSDU. It assumes that you know basic ADO release management features like artifacts, predeployment conditions, branch filters, release variables, variable groups, etc.
## Prerequisites
### Prerequisites ###
To be able to create a release definition, you require a build definition (a CI instance, does not have to be in ADO) artifact to be available at the release time. For this documentation's purpose, we will have a ADO CI instance build our artifact and make it available to release.
For our release, [this](https://dev.azure.com/slb-des-ext-collaboration/open-data-ecosystem/_apps/hub/ms.vss-ciworkflow.build-ci-hub?_a=edit-build-definition&id=145) is the build definition we will download our artifact from.
## Setting Up the Release Definition
#### Setting Up the Release Definition ####
Create a new release definition and add the build definition as an artifact with proper policies and triggers. The idea here is to run all setup scripts. It is preferred that they run sequentially, because there is a dependency between some of them, but if necessary some parts can be multiproccessed. For example, the `enable_apis` and `service_accounts` steps should run before a lot of other steps but `create-buckets` and `create-topic` can run in parallel.
We recommend using Task groups for grouping tasks as they are reusable if more environments are added to the release, but its not mandatory. The tasks should be executed in the following order:
\*: The items marked with \* are optional at the moment, since they are required only for ingestion services.
1. 1_enable_apis: enables GCP APIs so they can be used
2. 2_service_accounts: creates all required service accounts
3. create-redis-instance.sh: creates Redis deployments for caching
4. 3_create_cryptographic_key: creates KMS entries
5. 4_generate_service_account_key: generates service account keys, puts them in buckets, and changes IAM permissions
6. 5_bq_sink_dataset: creates biq query sink and dataset
7. validate-tenant-name.sh
8. create-datastore-entities/create-datastore-entities.sh
9. create-buckets.sh
10. \* create-tenant-buckets.sh
11. read-datastore-table.sh
12. create-indexer-queue/create-queue.sh
13. create-pub-sub/create-topic.sh
14. create-pub-sub/create-subscriptions.sh
15. create-definitions/create-index-definitions.sh
16. \* create_topics_subscriptions.sh
17. add-master-service-account.sh
18. elastic-search-settings/create-search-service-user.sh
19. \* create-logstore-datastore-entities/create-logstore-datastore-entities.sh
20. \* create-datastore-ingestion-hookup/create-datastore-ingestion-hookup.sh
21. MANUAL STEP:
1. Create an application at the Cloud Console APIs&Services OAuth consent screen
2. Enable DwD on the service accounts `datafier`, `domain-creator`, `entitlements-gsuite`, `entitlements-gsuite-init`, `entitlements-gsuite-0`, `entitlements-gsuite-1`, `entitlements-gsuite-2`. This will generate a Client ID for each one
3. Get the `datafier` service account Client ID from the GCP console and give it 'https://www.googleapis.com/auth/devstorage.full_control' OAuth scope in G Suite
4. Get the `domain-creator` service account Client ID from the GCP console and give it
'https://www.googleapis.com/auth/admin.directory.domain,https://www.googleapis.com/auth/siteverification' OAuth scope in G Suite
5. Give the Client ID of service accounts `entitlements-gsuite`, `entitlements-gsuite-init`, `entitlements-gsuite-0`, `entitlements-gsuite-1`, `entitlements-gsuite-2`
'https://www.googleapis.com/auth/admin.directory.domain.readonly, https://www.googleapis.com/auth/admin.directory.group, https://www.googleapis.com/auth/admin.directory.group.member, https://www.googleapis.com/auth/admin.directory.group.member.readonly, https://www.googleapis.com/auth/admin.directory.group.readonly, https://www.googleapis.com/auth/admin.directory.user' scope in G Suite
22. create users opendes_verifier and opendes_admin in G Suite and grant them admin permissions
23. create-subdomain/create-subdomain.sh.
Sometimes this step fails and requires a restart. In this case visit G Suite DwD page (Secrity-Advanced settings)
24. datalake-groups-init/create-default-groups.sh
25. datafier-permissions.sh
26. MANUAL STEP:
We recommend using Task groups for grouping tasks as they are reusable if more environments are added to the release, but its not mandatory. The tasks should be executed in the following order.
The items marked with \* are optional at the moment, since they are required only for ingestion services.
1. 1_enable_apis: enables GCP APIs so they can be used.
1. 2_service_accounts: creates all required service accounts.
1. create-redis-instance.sh: creates Redis deployments for caching.
1. 3_create_cryptographic_key: creates KMS entries.
1. 4_generate_service_account_key: generates service account keys, puts them in buckets, and changes IAM permissions.
1. 5_bq_sink_dataset: creates biq query sink and dataset.
1. validate-tenant-name.sh.
1. create-datastore-entities/create-datastore-entities.sh.
1. create-buckets.sh.
1. \* create-tenant-buckets.sh.
1. read-datastore-table.sh.
1. create-indexer-queue/create-queue.sh.
1. create-pub-sub/create-topic.sh.
1. create-pub-sub/create-subscriptions.sh.
1. create-definitions/create-index-definitions.sh.
1. \* create_topics_subscriptions.sh.
1. add-master-service-account.sh.
1. elastic-search-settings/create-search-service-user.sh.
1. \* create-logstore-datastore-entities/create-logstore-datastore-entities.sh.
1. \* create-datastore-ingestion-hookup/create-datastore-ingestion-hookup.sh.
1. MANUAL STEP:.
1. Create an application at the Cloud Console APIs&Services OAuth consent screen.
1. Enable DwD on the service accounts `datafier`, `domain-creator`, `entitlements-gsuite`, `entitlements-gsuite-init`, `entitlements-gsuite-0`, `entitlements-gsuite-1`, `entitlements-gsuite-2`. This will generate a Client ID for each one.
1. Get the `datafier` service account Client ID from the GCP console and give it <https://www.googleapis.com/auth/devstorage.full_control> OAuth scope in G Suite.
1. Get the `domain-creator` service account Client ID from the GCP console and give it
<https://www.googleapis.com/auth/admin.directory.domain>, <https://www.googleapis.com/auth/siteverification> OAuth scope in G Suite.
1. Give the Client ID of service accounts `entitlements-gsuite`, `entitlements-gsuite-init`, `entitlements-gsuite-0`, `entitlements-gsuite-1`, `entitlements-gsuite-2`
<https://www.googleapis.com/auth/admin.directory.domain.readonly>, <https://www.googleapis.com/auth/admin.directory.group>, <https://www.googleapis.com/auth/admin.directory.group.member>, <https://www.googleapis.com/auth/admin.directory.group.member.readonly>, <https://www.googleapis.com/auth/admin.directory.group.readonly>, <https://www.googleapis.com/auth/admin.directory.user> scope in G Suite.
1. Create users opendes_verifier and opendes_admin in G Suite and grant them admin permissions.
1. create-subdomain/create-subdomain.sh.
Sometimes this step fails and requires a restart. In this case visit G Suite DwD page (Secrity-Advanced settings).
1. datalake-groups-init/create-default-groups.sh.
1. datafier-permissions.sh.
1. MANUAL STEP:
1. (if not deployed) Deploy a 'default' service to GAE (this is a pre-req by appengine, only then you can deploy other services)
2. (if not deployed) Deploy the Entitlements service and Entitlements cache sync service (need to setup pipeline with env variables, need to update config toml files before that)
3. After the next job is run (Deploy crons), run the `entitlementscachesync` cron job manually from the app engine GCP console
27. post_deployment_appengine_cron/appengine_cron.sh
1. (if not deployed) Deploy the Entitlements service and Entitlements cache sync service (need to setup pipeline with env variables, need to update config toml files before that)
1. After the next job is run (Deploy crons), run the `entitlementscachesync` cron job manually from the app engine GCP console
1. post_deployment_appengine_cron/appengine_cron.sh
1. After the next job is run (Deploy crons), run the `entitlementscachesync` cron job manually from the app engine GCP console
Release definition must be completed with proper variables. A full list of variables can be found in the [ENV-variables.md](./ENV-variables.md) document.
## Executing the Release Definition
##### Executing the Release Definition #####
Create a new release and enter the values for variables configured to set at release time.
# Environment Variables
## Introduction
This document contains the environment variables required to run [this](https://dev.azure.com/slb-des-ext-collaboration/open-data-ecosystem/_releaseDefinition?definitionId=45&_a=environments-editor-preview) release definition. It configures a newly created GCP to run OSDU microservices
## Preconfigured Variables
These variables have a set value and will rarely change
|ENV_VAR_NAME |ENV_VAR_VALUE |ENV_VAR_SCOPE|
......@@ -15,39 +18,45 @@ These variables have a set value and will rarely change
|6. COMPLIANCE_RULE_SET | shared | Release |
## G Suite Domain Specific Environment Variables
G Suite domain setup is configured separately from GCP, so the same domain can be used by multiple OSDU ecosystems. Therefore, it is better to store these variables in a variable group for sharing purpose.
1. GOOGLE_CLOUD_IDENTITY_ADMIN_EMAIL: admin@example.com
2. GOOGLE_DOMAIN: osdu.example.com
3. GOOGLE_CLOUD_IDENTITY_VERIFIER_EMAIL: dns-verify@example.com
1. GOOGLE_DOMAIN: osdu.example.com
1. GOOGLE_CLOUD_IDENTITY_VERIFIER_EMAIL: dns-verify@example.com
## Other Sharable Variables
Other variables that can be shared between microservices pipelines can also be stored in variable groups. This is optional and chosen only to avoid duplicating the variables usage across different release pipelines.
1. GCLOUD_PROJECT: Google Project ID
2. CLOUDSDK_COMPUTE_ZONE: GCP compute zone (for example, `us-central1-a`)
3. ENVIRONMENT: This is to specify if this GCP is for a prod or a non-prod OSDU environment. Its value can either be `evd`, `evt`, or `p4d` for non prod environments and `demo`, `prod-us`, or `prod-eu` for prod environments
4. GCLOUD_REGION: App engine region (for example, `us-central`)
5. GOOGLE_AUDIENCES: Follow the below steps to generate this
1. CLOUDSDK_COMPUTE_ZONE: GCP compute zone (for example, `us-central1-a`)
1. ENVIRONMENT: This is to specify if this GCP is for a prod or a non-prod OSDU environment. Its value can either be `evd`, `evt`, or `p4d` for non prod environments and `demo`, `prod-us`, or `prod-eu` for prod environments
1. GCLOUD_REGION: App engine region (for example, `us-central`)
1. GOOGLE_AUDIENCES: Follow the below steps to generate this
* On the GCP, go to `APIs & Services` -> `Credentials`
* Click on `Create credentials` -> `OAuth client ID`
* Select `Web application` as application type and enter a name (for example, `osdu-service-audience`)
* Under `Authorized redirect URIs`, add the following value: <br> https://developers.google.com/oauthplayground
* Under `Authorized redirect URIs`, add the following value:
<https://developers.google.com/oauthplayground>
* Click `Create`
* The value of the Client ID that gets generated is `GOOGLE_AUDIENCES`
## Release-Time Variables
Insert these variables when creating a new release for the release definition.
1. DATA_PARTITION_ID: In multitenancy terms, this is a separate partition which will have a 1-1 mapping to a tenant GCP. For all practical purposes, for our use case, this can be the same as GCP Project ID
2. ELASTIC_ADMIN_PASSWORD: Default password returned by Elastic right after creating a cluster
3. ELASTIC_HOST: Host endpoint of elastic cluster. Remove `https://` from the beginning of this URL and port `:9243` from the ending
4. TENANT_PROJECT: This is same as the GCP Project ID
1. ELASTIC_ADMIN_PASSWORD: Default password returned by Elastic right after creating a cluster
1. ELASTIC_HOST: Host endpoint of elastic cluster. Remove `https://` from the beginning of this URL and port `:9243` from the ending
1. TENANT_PROJECT: This is same as the GCP Project ID
All these variables should be ticked `Settable at release time` since their values are set release time.
## Other Misc Variables
Any other variables that are specific to this pipeline should be configured in the Pipeline variables section with the proper scope.
1. DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY: This is the base64 encoded value of the service account json key file for service account `domain-creator` in gcp. This value must be kept secret.
2. ENVIRONMENT_NAME: Can have either `dev`, `testing`, `p4d`, `demo`, `prod-eu`, or `prod-us` value
1. ENVIRONMENT_NAME: Can have either `dev`, `testing`, `p4d`, `demo`, `prod-eu`, or `prod-us` value
This diff is collapsed.
......@@ -37,6 +37,8 @@ create-bucket-helper() {
STORAGE_CLASS="Multi_Regional"
BUCKET_NAME=$TENANT_PROJECT"-records"
LOGSTORE_BUCKET_NAME=$TENANT_PROJECT"-logstore"
SCHEMA_BUCKET_NAME=$TENANT_PROJECT"-schema"
UNIT_BUCKET_NAME==$TENANT_PROJECT"-unit-catalog-bucket"
GOOGLE_PROJECT_REGION=`echo ${GCLOUD_REGION} | cut -d- -f1`
if [ "${GOOGLE_PROJECT_REGION}" == "europe" ]
......@@ -46,6 +48,8 @@ create-bucket-helper() {
create-bucket $BUCKET_NAME
create-bucket $LOGSTORE_BUCKET_NAME
create-bucket $SCHEMA_BUCKET_NAME
create-bucket $UNIT_BUCKET_NAME
}
if [ "${NEW_TENANT}" == "true" ]
......@@ -60,4 +64,4 @@ else
export TENANT_PROJECT=$projectId
create-bucket-helper
done <./datastore-table-info.txt
fi
fi
\ No newline at end of file
......@@ -22,11 +22,15 @@ source ../validate-env.sh $DATA_PARTITION_ID
# TENANT_NAME is evaluated from DATA_PARTITION_ID
source ../set-tenant-name.sh
echo "activate virtual env"
python3 -m venv env
source env/bin/activate
#echo "activate virtual env"
#python3 -m venv env
# Add permanent env directory ../env on step 0 script 0_pyenv.sh. Comments depricated lines
#source ../env/bin/activate
pip install google-cloud-datastore
#source env/bin/activate
#pip3 install --upgrade setuptools
#pip3 install --upgrade pip
#pip install google-cloud-datastore
echo "call script to create datastore entities"
python create-entity.py --tenant_project "${TENANT_PROJECT}" --master_project "${GCLOUD_PROJECT}" --compliance_rule_set "${COMPLIANCE_RULE_SET}" --crm_account_ids "${CRM_ACCOUNT_IDS}" --tenant_name "${TENANT_NAME}" --data_partition_id "${DATA_PARTITION_ID}"
......@@ -34,7 +38,7 @@ ret=$?
if [ $ret -ne 0 ]; then
echo "Error while creating tenant mapping"
fi
echo "deactivate virtual env"
deactivate
rm -rf env
\ No newline at end of file
#echo "deactivate virtual env"
#deactivate
#rm -rf env
......@@ -17,11 +17,12 @@
source ../validate-env.sh $TENANT_PROJECT
echo "activate virtual env"
python3 -m venv env
source env/bin/activate
#echo "activate virtual env"
#python3 -m venv env
#source ../env/bin/activate
pip install google-cloud-datastore
#source env/bin/activate
#pip install google-cloud-datastore
echo "Creating/Updating Hookup Topics on $TENANT_PROJECT"
python create-datastore-ingestion-hookup.py $TENANT_PROJECT &>/dev/null
......@@ -29,7 +30,7 @@ ret=$?
if [ $ret -ne 0 ]; then
echo "Error while creating tenant mapping"
fi
echo "deactivate virtual env"
deactivate
rm -rf env
\ No newline at end of file
#echo "deactivate virtual env"
#deactivate
#rm -rf env
\ No newline at end of file
......@@ -33,9 +33,11 @@ fi
echo "activate virtual env"
python3 -m venv env
source env/bin/activate
# Add permanent env directory ../env on step 0 script 0_pyenv.sh. Comments depricated lines
source ../env/bin/activate
pip install google-cloud-datastore
#source env/bin/activate
#pip install google-cloud-datastore
echo "call script to create datastore entities"
echo ${LOGSTORE_HOST}
......
......@@ -19,9 +19,10 @@ source ../validate-env.sh $DATA_PARTITION_ID
source ../validate-env.sh $GOOGLE_DOMAIN
source ../validate-env.sh $GOOGLE_CLOUD_IDENTITY_VERIFIER_EMAIL
source ../validate-env.sh $GOOGLE_CLOUD_IDENTITY_ADMIN_EMAIL
source ../validate-env.sh $DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY
source ../set-tenant-name.sh
DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY=$1
###DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY=$1
if [ "$DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY" = "" ]
then
echo "Domain creator service account key can not be empty"
......@@ -30,9 +31,11 @@ fi
echo "Activating virtual environment"
python3 -m venv env
source env/bin/activate
# Add permanent env directory ../env on step 0 script 0_pyenv.sh. Comments depricated lines
source ../env/bin/activate
pip install --upgrade google-api-python-client google-auth
#source env/bin/activate
#pip install --upgrade google-api-python-client google-auth
echo "calling script to create and verify subdomain"
python create-subdomain.py --service_account_key "${DOMAIN_CREATOR_SERVICE_ACCOUNT_KEY}" --tenant_name "${TENANT_NAME}" --parent_domain "${GOOGLE_DOMAIN}" --verifier "${GOOGLE_CLOUD_IDENTITY_VERIFIER_EMAIL}" --admin "${GOOGLE_CLOUD_IDENTITY_ADMIN_EMAIL}"
......
This script creates the Composer environment in your project.
The script uses jq. Please install jq before starting the script.
You have to pass at least one parameter to the script, the name for the Composer environment:
```shell
./create_composer my-env
```
You can also specify parameters for the composer according to the [official documentation](https://cloud.google.com/sdk/gcloud/reference/composer/environments/create)
```shell
./create_composer my-env --location europe-west3 --node-count=5 --machine-type=n1-standard-2 --env-variables=ENV=test,TEST=true
```
After completion, the script returns the airflow url and the bucket name, and creates the file `vars.json`. You should upload this file to the airflow, these variables are used in the ingestion dags.
Open airflow and go to `Admin->Variables`, select `vars.json`, and import variables:
![Admin->Variables](img/vars.png "Admin->Variables")
![Import Variables](img/import.png "Import Variables")
![Result](img/result.png "Result")
These variables are enough for the environment creation. After all services are deployed, you need to add the following variables:
- workflow_url - workflow service URL
- file_api_url - file service url
- search_url - search service URL
- storage_url - storage service URL with api suffix /api/storage/v2/records
- legal - tag, which is used for ingested records
- record_kind - kind of record, for example, opendes:osdu:file:0.1.0
- schema_version - version of schema, ex. 0.1.0
#! /bin/bash
if [ ! -n "$1" ]
then
echo "You have to specify at least a name for your environment!"
exit 1
fi
CHECK=$(apt-cache policy jq | grep Installed)
if [[ -n $CHECK ]]
then
echo "jq is installed. Resume"
else
echo "Please, install jq. Abort"
exit 1
fi
# clone ingestion dags repo
echo "Cloning DAGs repository"
git clone https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags.git
git clone https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk.git
# enable composer api
gcloud services enable composer.googleapis.com
# create composer env
gcloud config set composer/location $CLOUDSDK_COMPUTE_ZONE
# gcloud composer environments create $*
# get bucket name and airflow uri
BUCKET=$(gcloud composer environments describe $1 --format json | jq .config.dagGcsPrefix | tr -d '"' | sed -e 's|/dags||')
AIRFLOW=$(gcloud composer environments describe $1 --format json | jq .config.airflowUri | tr -d '"')
# upload dags
gsutil -m rsync -r ./ingestion-dags/src $BUCKET
gsutil -m cp -r ./common-python-sdk/osdu_api $BUCKET/dags
rm -rf ingestion-dags
# prepare file with vars
sed 's|PARTITION|'$DATA_PARTITION_ID'|g' vars.template > vars.json
sed -i 's|DOMAIN|'$GOOGLE_DOMAIN'|g' vars.json
echo "Composer env created."
echo "Airflow is avilable at $AIRFLOW"
echo "Dags are at $BUCKET/dags"
\ No newline at end of file
{
"acl":"{'viewers': ['data.default.viewers@PARTITION.DOMAIN'],'owners': ['data.default.viewers@PARTITION.DOMAIN']}",
"dataload_config_path":"/home/airflow/gcs/dags/configs/dataload.ini",
"data_partition_id":"PARTITION",
"entitlements_module_name":"entitlements_client",
"path_to_third_party":"/home/airflow/gcs/libs",
"provider":"gcp",
"search_query_ep":"api/search/v2/query",
"update_status_ep":"updateStatus"
}
\ No newline at end of file
......@@ -22,12 +22,13 @@ source ./validate-env.sh $GCLOUD_PROJECT
source ./validate-env.sh $ENVIRONMENT_NAME
source ./validate-env.sh $NEW_TENANT
echo "activate virtual env"
python3 -m venv env
source env/bin/activate
#echo "activate virtual env"
#python3 -m venv env
#source ../env/bin/activate
source env/bin/activate
pip3 install --upgrade pip setuptools
python3 -m pip install google-cloud google-cloud-pubsub google-cloud-runtimeconfig
pip install google-cloud google-cloud-pubsub google-cloud-runtimeconfig
if [ "${NEW_TENANT}" == "true" ]
then
......@@ -60,6 +61,6 @@ else
done <./datastore-table-info.txt
fi
echo "deactivate virtual env"
deactivate
rm -rf env
\ No newline at end of file
#echo "deactivate virtual env"
#deactivate
#rm -rf env
......@@ -89,4 +89,6 @@ then
fi
else
echo "Variable GSUITE_GROUPS_BOOTSTRAP is not set to true"
fi
\ No newline at end of file
fi
rm -rf bin/ && rm -rf pkg/
\ No newline at end of file
......@@ -23,7 +23,13 @@
{
"name": "users",
"description": "Datalake users"
"description": "Datalake users",
"members":
[
{
"name": "osdu-gcp-sa"
}
]
},
{
......
......@@ -168,6 +168,30 @@
"name": "users.datalake.admins"
}
]
},
{
"name": "service.schema-service.editors",
"description": "Schema service editors",
"members": [
{
"name": "users.datalake.ops"
},
{
"name": "users.datalake.editors"
},
{
"name": "users.datalake.admins"
}
]
},
{
"name": "service.schema-service.viewers",
"description": "Schema service viewers",
"members": [
{
"name": "users.datalake.viewers"
}
]
}
]
}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment