Newer
Older
The purpose of this repository is to provide a docker image featuring **Apache Airflow** on which you can rely to launch the whole stack (see [docker compose](compose/docker-compose.yml)).
There is also a readme covering the **kubernetes environment** set up instead of using the docker compose services. Kubernetes will offer you the ability to test Witsml parser and also the csv parser :
[KUBERNETES SET UP](KUBE_README.md)
And you find a few details about the **Airflow Stable API** here :
[Airflow Stable API for OSDU](AIRFLOW_README.md)
Main public :
- DAGs developers
For more details about the stack, please refer to [Airflow docker stack](https://airflow.apache.org/docs/docker-stack/build.html).
**Important note :**
The running Airflow instance is linked to the CSP APIs. You may test your dags under any CSPs.
# Content
## Mandatory libraries
If you take a look at the [Dockerfile](Dockerfile) you will notice the installation of the following libraries :
- [Airflow lib](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib)
- [Osdu Api](https://community.opengroup.org/osdu/platform/system/sdks/common-python-sdk)
Those libraries are mandatory.
## DAGs
Then comes the Ingestion dags which are simply copied to the [dags](compose/dags) folder (the folder should be writable - chmod -Rf 777 dags).
This is the place where you can add your own dags for development.
For the sake of the example here we included only 1 DAG and its dependencies (Osdu_ingest). You can get a more up to date version from the [ingestion dags repository](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/tree/master/src/osdu_dags)
## Plugins
In case you need to test some Airflow plugins, you can make the folder writable (chmod -Rf 777 plugins).
Under the [data folder](data/) you may add/alter some payload json files for testing purposes.
Note that you will need to trigger the dags from the airflow container directly (see below).
Make sure you provide read/write access to the data folder (eg: chmod -Rf 777 data)
## Logs folder
Logs of the workflow are being written in the logs folder, you should also make it writable (eg: chmod -Rf 777 logs).
# Build the image
docker build . --tag osdu-airflow:0.0.1
# Docker Compose
From the [Original docker compose](https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml) set up on which you can rely, we've modified it using our own image (*eg : osdu-airflow:0.0.1*) plus a few environment variables including :
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
- CSP specific environment variables
Again for the sake of the example, we've included a sample of [docker-compose.yml](compose/docker-compose.yml) file.
You will need alter it upon your needs.
Only the part at the top is usually modified :
```yaml
version: "3.7"
x-airflow-common:
&airflow-common
image: osdu-airflow:0.0.1
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
CI_COMMIT_TAG: v1.0
CLOUD_PROVIDER: ###CSP NAME###
client_id: #####
client_secret: #####
username: #####
password: #####
# Place here additional required variables - see below
volumes:
- ./airflow/airflow.cfg:/opt/airflow/airflow.cfg
- ./data:/opt/airflow/data
- ./logs:/opt/airflow/logs
```
- For instance on IBM you will need those variables :
```yaml
CLOUD_PROVIDER: ibm
KEYCLOACK_URI: KEYCLOAK_AUTH_URL
REALM_NAME: KEYCLOAK REALM
COS_URL: URL_OF_STORAGE
COS_ACCESS_KEY: STORAGE_ACCESS_KEY
COS_SECRET_KEY: STORAGE_SECRET_KEY
COS_REGION: STORAGE_REGION
client_id: IBM_API_CLIENT_ID
client_secret: IBM_API_CLIENT_SECRET
username: IBM_GENERIC_USERNAME
password: IBM_GENERIC_PASSWORD
```
- On GCP :
```yaml
CLOUD_PROVIDER: gcp
client_id: GCP_API_CLIENT_ID
client_secret: GCP_API_CLIENT_SECRET
username: GCP_GENERIC_USERNAME
password: GCP_GENERIC_PASSWORD
# more to come...
```
- On AWS :
```yaml
CLOUD_PROVIDER: aws
client_id: AWS_API_CLIENT_ID
client_secret: AWS_API_CLIENT_SECRET
username: AWS_GENERIC_USERNAME
password: AWS_GENERIC_PASSWORD
# more to come...
```
- On Azure/Microsoft :
```yaml
CLOUD_PROVIDER: ms
client_id: MS_API_CLIENT_ID
client_secret: MS_API_CLIENT_SECRET
username: MS_GENERIC_USERNAME
password: MS_GENERIC_PASSWORD
# more to come...
```
You may find those details from the **Airflow variables** (refer to the Airflow instance running on preship environments) but also from the [preshipping team repository](https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/tree/master/).
## Airflow Configuration file
Using the [default Airflow config](https://github.com/apache/airflow/blob/main/airflow/config_templates/default_airflow.cfg), we can play with the configuration with ease (ex: launch airflow in debug mode - *logging_level = DEBUG*).
Make sure to fill in proper details of airflow path (eg: */opt/airflow*).
In case of any Airflow upgrade, just replace with the latest default configuration.
## Run the stack
From compose folder :
Have data and logs folders under user user:root (chown -Rf required)
Make sure the logs, data and dags folders are accessible with write access :
- Export CSP specific Airflow variables :
From the CSP Airflow instance of your choice, head toward the following configuration variables listing :
And click on export to download the json format of your configuration :
- Import variables into your instance :
ex : *variables_CSP.json*
- Add some other variables depending on your needs :
core__config__show_skipped_ids=true
The above one is a nice to have for debugging purpose.
If you are using Windows there is a procedure to follow in order to share volumes from docker-compose :
You need to add the volumes' path (logs, dags, data) into *FileSharing* of docker.
# Report of any issue
Please use the current repository issues board for any question/issue you may encounter.
You may also ask the community on the Ingestion DAGs Slack channel.
------------------
DAGs' specifics
------------------
# Preparation
When your stack is running, you may check the containers :
docker ps
Retrieve the id of the Airflow worker and get inside :
docker exec -it #### bash
kubectl exec -it airflow-worker-0 bash -n airflow
Note : remove the -n airflow if you are not using a namespace.
# Manifest ingestion
## Trigger a Workflow
ou can issue the following commands to trigger the dag (make sure to copy the payload inside a new file payload.json within the container):
airflow dags trigger -c "$json" Osdu_ingest -r manual_manifest_1
**IMPORTANT NOTE** : *manual_1* or *witsml_1* should match the RUN ID you have under the payload.json sample.
But **also** match an existing ID from the Workflow service of the CSP you are testing on.
In order to have a matching RUN ID on Workflow API, you should trigger the workflow from the Workflow API with a value for the run id.
You will need to achieve this step just before the above command.
Example :
curl --location --request POST '.../api/workflow/v1/workflow/Osdu_ingest/workflowRun' \
--header 'data-partition-id: ###' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ###' \
--data-raw '{
"runId": "manual_1",
"executionContext" : {
"acl" : {
...
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
---------------
Troubleshooting
---------------
# osdu_api.ini missing / config_file_path not found
Code => config_manager under common_python_sdk project
Environment variables must be filled in from this file : osdu_api.ini
And link to this file must be set up using the OSDU_API_CONFIG_INI variable
- First Task of DAG shows the following in log :
configparser.Error(f"Could not find the config file in '{config_file_path}'.")
configparser.Error: Could not find the config file in 'osdu_api.ini'.
- Solution :
Make sure the config_file_path is properly set up under the airflow chart values environment fields.
However, we are filling up variables there ar it is common to all CSP for our custom set up (we rather use the variables defined in overriden chart values)
ex :
```yaml
- name: OSDU_API_CONFIG_INI
value: /opt/airflow/data/osdu_api.ini
```
**Note** : the Dockerfile provided from witsml repository might not be up to date - please check with the CSP team. As an alternative post an issue in the current repository (using customized CSPWitsmlParserDockerfile).
Windows install :
hostPath to modifyto match your machine and location of the dags