The purpose of this repository is to provide a docker image featuring **Apache Airflow** on which you can rely to launch the whole stack (see [docker compose](compose/docker-compose.yml)).

There is also a readme covering the **kubernetes environment** set up instead of using the docker compose services. Kubernetes will offer you the ability to test Witsml parser and also the csv parser :


And you find a few details about the **Airflow Stable API** here :

[Airflow Stable API for OSDU](

Main public : 

    - DAGs developers

For more details about the stack, please refer to [Airflow docker stack](

**Important note :**

The running Airflow instance is linked to the CSP APIs. You may test your dags under any CSPs.

# Content

## Mandatory libraries

If you take a look at the [Dockerfile](Dockerfile) you will notice the installation of the following libraries :

- [Airflow lib](
- [Osdu Api](
Those libraries are mandatory.

## DAGs

Then comes the Ingestion dags which are simply copied to the [dags](compose/dags) folder (the folder should be writable - chmod -Rf 777 dags). 

This is the place where you can add your own dags for development.

For the sake of the example here we included only 1 DAG and its dependencies (Osdu_ingest). You can get a more up to date version from the [ingestion dags repository](

## Plugins

In case you need to test some Airflow plugins, you can make the folder writable (chmod -Rf 777 plugins).

## Data folder
Under the [data folder](data/) you may add/alter some payload json files for testing purposes. 

Note that you will need to trigger the dags from the airflow container directly (see below).

Make sure you provide read/write access to the data folder (eg: chmod -Rf 777 data)

## Logs folder

Logs of the workflow are being written in the logs folder, you should also make it writable (eg: chmod -Rf 777 logs).

# Build the image

    docker build . --tag osdu-airflow:0.0.1

# Docker Compose    

From the [Original docker compose]( set up on which you can rely, we've modified it using our own image (*eg : osdu-airflow:0.0.1*) plus a few environment variables including :

    - common variables (ex: CLOUD_PROVIDER...)
    - CSP specific environment variables

Again for the sake of the example, we've included a sample of [docker-compose.yml](compose/docker-compose.yml) file.
You will need alter it upon your needs.

Only the part at the top is usually modified :

version: "3.7"
  image: osdu-airflow:0.0.1
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    CI_COMMIT_TAG: v1.0
    client_id: #####
    client_secret: #####
    username: #####
    password: #####
    # Place here additional required variables - see below
    - ./airflow/airflow.cfg:/opt/airflow/airflow.cfg
    - ./data:/opt/airflow/data
    - ./logs:/opt/airflow/logs

- For instance on IBM you will need those variables :

    client_id: IBM_API_CLIENT_ID
    client_secret: IBM_API_CLIENT_SECRET

- On GCP :

    client_id: GCP_API_CLIENT_ID
    client_secret: GCP_API_CLIENT_SECRET
    # more to come...

- On AWS :

    client_id: AWS_API_CLIENT_ID
    client_secret: AWS_API_CLIENT_SECRET
    # more to come...
- On Azure/Microsoft :

    client_id: MS_API_CLIENT_ID
    client_secret: MS_API_CLIENT_SECRET
    # more to come...

You may find those details from the **Airflow variables** (refer to the Airflow instance running on preship environments) but also from the [preshipping team repository](
## Airflow Configuration file

Using the [default Airflow config](, we can play with the configuration with ease (ex: launch airflow in debug mode - *logging_level = DEBUG*). 

Make sure to fill in proper details of airflow path (eg: */opt/airflow*).

In case of any Airflow upgrade, just replace with the latest default configuration.

## Run the stack
From compose folder :

    Have data and logs folders under user user:root (chown -Rf required)

Make sure the logs, data and dags folders are accessible with write access :
    sudo chmod -Rf 777 logs

- Start the stack :
    docker-compose up

- Export CSP specific Airflow variables :

From the CSP Airflow instance of your choice, head toward the following configuration variables listing :

![list of variables](airflow-vars.png)
Then select all variables as follow :

![checkbox - all variables](var-select.png)
And click on export to download the json format of your configuration :

![export variables](var-export.png)
- Import variables into your instance :

ex : *variables_CSP.json*

- Add some other variables depending on your needs : 


The above one is a nice to have for debugging purpose.

### Windows users
If you are using Windows there is a procedure to follow in order to share volumes from docker-compose :

You need to add the volumes' path (logs, dags, data) into *FileSharing* of docker.

# Report of any issue

Please use the current repository issues board for any question/issue you may encounter.
You may also ask the community on the Ingestion DAGs Slack channel.

DAGs' specifics

# Preparation
When your stack is running, you may check the containers :

    docker ps

    or  (kubernetes)
    kubectl get pods -n airflow

Retrieve the id of the Airflow worker and get inside :

    docker exec -it #### bash

    or (kubernetes)
    kubectl exec -it airflow-worker-0 bash -n airflow

Note : remove the -n airflow if you are not using a namespace.

# Manifest ingestion

## Trigger a Workflow

ou can issue the following commands to trigger the dag (make sure to copy the payload inside a new file payload.json within the container):
    json=$(cat payload.json)
    airflow dags trigger -c "$json" Osdu_ingest -r manual_manifest_1

etienne peysson's avatar
But **also** match an existing ID from the Workflow service of the CSP you are testing on.
In order to have a matching RUN ID on Workflow API, you should trigger the workflow from the Workflow API with a value for the run id.
You will need to achieve this step just before the above command.

Example :

    curl --location --request POST '.../api/workflow/v1/workflow/Osdu_ingest/workflowRun' \
    --header 'data-partition-id: ###' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer ###' \
    --data-raw '{
        "runId": "manual_1",
        "executionContext" : {
        "acl" : {

# osdu_api.ini missing / config_file_path not found

Code => config_manager under common_python_sdk project
Environment variables must be filled in from this file : osdu_api.ini 
And link to this file must be set up using the OSDU_API_CONFIG_INI variable

- First Task of DAG shows the following in log :

    configparser.Error(f"Could not find the config file in '{config_file_path}'.")
    configparser.Error: Could not find the config file in 'osdu_api.ini'.

- Solution : 

Make sure the config_file_path is properly set up under the airflow chart values environment fields.
However, we are filling up variables there ar it is common to all CSP for our custom set up (we rather use the variables defined in overriden chart values)

ex : 
    value: /opt/airflow/data/osdu_api.ini

**Note** : the Dockerfile provided from witsml repository might not be up to date - please check with the CSP team. As an alternative post an issue in the current repository (using customized CSPWitsmlParserDockerfile).

Windows install :
hostPath to modifyto match your machine and location of the dags